You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by Yuanjin Lin <li...@zhihu.com> on 2018/11/02 10:14:16 UTC

[DISCUSS] How hard is it to separate the logic layer and the storage layer of Kafka broker?

Hi all,

I am a software engineer from Zhihu.com. Kafka is so great and used heavily
in Zhihu. There are probably over 2K Kafka brokers in total.

However, we are suffering from the problem that the performance degrades
rapidly when the number of topics increases(sadly, we are using HDD). We
are considering separating the logic layer and the storage layer of Kafka
broker like Apache Pulsar.

After the modification, a server may have several Kafka brokers and more
topics. Those brokers all connect to a sole storage engine via RPC. The
sole storage can do the load balancing work easily, and avoid creating too
many files which hurts HDD.

Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
right?

Regards,
Lin.

Re: [DISCUSS] How hard is it to separate the logic layer and the storage layer of Kafka broker?

Posted by Jun Rao <ju...@confluent.io>.

Hi, Yulin,

I assume the performance issue that you mentioned is for writes instead of
reads. By default, Kafka flushes data to disks asynchronously in batches.
So, even when there are multiple files to write, the batching can amortize
the HDD seek overhead. It would be useful to understand the number of I/Os
and the size of each I/O in your environment.

Thanks,

Jun

On Fri, Nov 2, 2018 at 9:50 PM, Yuanjin Lin <li...@zhihu.com> wrote:

> Colin, Thanks for the meaningful reply!
>
> We are 100% sure those HDDs are the bottleneck. Almost 90% alerts are about
> HDDs. I am the guy who have to deal with it. The common scenario would be
> 100-400 partitions per HDD(2TB size).  Due to some historical reasons,
> developers in my company tend to put everything to Kafka, cuz it makes them
> feel safe. Although, we have over 2k servers for Kafka. I still can receive
> alerts everyday.
>
> If the modification I proposed is not too hard, I will begin to do it next
> month.
>
>
> On Sat, Nov 3, 2018 at 1:36 AM Colin McCabe <cm...@apache.org> wrote:
>
> > On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote:
> > > Hi all,
> > >
> > > I am a software engineer from Zhihu.com. Kafka is so great and used
> > heavily
> > > in Zhihu. There are probably over 2K Kafka brokers in total.
> > >
> > > However, we are suffering from the problem that the performance
> degrades
> > > rapidly when the number of topics increases(sadly, we are using HDD).
> >
> > Hi Yuanjin,
> >
> > How many partitions are you trying to create?
> >
> > Do you have benchmarks confirming that disk I/O is your bottleneck?
> There
> > are a few cases where large numbers of partitions may impose CPU and
> > garbage collection burdens.  The patch on
> > https://github.com/apache/kafka/pull/5206 illustrates one of them.
> >
> > > We are considering separating the logic layer and the storage layer of
> > Kafka
> > > broker like Apache Pulsar.
> > >
> > > After the modification, a server may have several Kafka brokers and
> more
> > > topics. Those brokers all connect to a sole storage engine via RP The
> > > sole storage can do the load balancing work easily, and avoid creating
> > too
> > > many files which hurts HDD.
> > >
> > > Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
> > > right?
> >
> > It would help to know what the problem is here.  If the problem is a
> large
> > number of files, then maybe the simplest approach would be creating fewer
> > files.  You don't need to introduce a new layer of servers in order to do
> > that.  You could use something like RocksDB to store messages and
> indices,
> > or create your own file format which combined together things which were
> > previously separate.  For example, we could combine the timeindex and
> index
> > files.
> >
> > As I understand it, Pulsar made the decision to combine together data
> from
> > multiple partitions in a single file.  Sometimes a very large number of
> > partitions.  This is great for writing, but not so good if you want to
> read
> > historical data from a single topic.
> >
> > regards,
> > Colin
> >
>

Re: [DISCUSS] How hard is it to separate the logic layer and the storage layer of Kafka broker?

Posted by Yuanjin Lin <li...@zhihu.com>.

Colin, Thanks for the meaningful reply!

We are 100% sure those HDDs are the bottleneck. Almost 90% alerts are about
HDDs. I am the guy who have to deal with it. The common scenario would be
100-400 partitions per HDD(2TB size).  Due to some historical reasons,
developers in my company tend to put everything to Kafka, cuz it makes them
feel safe. Although, we have over 2k servers for Kafka. I still can receive
alerts everyday.

If the modification I proposed is not too hard, I will begin to do it next
month.


On Sat, Nov 3, 2018 at 1:36 AM Colin McCabe <cm...@apache.org> wrote:

> On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote:
> > Hi all,
> >
> > I am a software engineer from Zhihu.com. Kafka is so great and used
> heavily
> > in Zhihu. There are probably over 2K Kafka brokers in total.
> >
> > However, we are suffering from the problem that the performance degrades
> > rapidly when the number of topics increases(sadly, we are using HDD).
>
> Hi Yuanjin,
>
> How many partitions are you trying to create?
>
> Do you have benchmarks confirming that disk I/O is your bottleneck?  There
> are a few cases where large numbers of partitions may impose CPU and
> garbage collection burdens.  The patch on
> https://github.com/apache/kafka/pull/5206 illustrates one of them.
>
> > We are considering separating the logic layer and the storage layer of
> Kafka
> > broker like Apache Pulsar.
> >
> > After the modification, a server may have several Kafka brokers and more
> > topics. Those brokers all connect to a sole storage engine via RP The
> > sole storage can do the load balancing work easily, and avoid creating
> too
> > many files which hurts HDD.
> >
> > Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
> > right?
>
> It would help to know what the problem is here.  If the problem is a large
> number of files, then maybe the simplest approach would be creating fewer
> files.  You don't need to introduce a new layer of servers in order to do
> that.  You could use something like RocksDB to store messages and indices,
> or create your own file format which combined together things which were
> previously separate.  For example, we could combine the timeindex and index
> files.
>
> As I understand it, Pulsar made the decision to combine together data from
> multiple partitions in a single file.  Sometimes a very large number of
> partitions.  This is great for writing, but not so good if you want to read
> historical data from a single topic.
>
> regards,
> Colin
>

Re: [DISCUSS] How hard is it to separate the logic layer and the storage layer of Kafka broker?

Posted by Colin McCabe <cm...@apache.org>.

On Fri, Nov 2, 2018, at 03:14, Yuanjin Lin wrote:
> Hi all,
> 
> I am a software engineer from Zhihu.com. Kafka is so great and used heavily
> in Zhihu. There are probably over 2K Kafka brokers in total.
> 
> However, we are suffering from the problem that the performance degrades
> rapidly when the number of topics increases(sadly, we are using HDD).

Hi Yuanjin,

How many partitions are you trying to create?

Do you have benchmarks confirming that disk I/O is your bottleneck?  There are a few cases where large numbers of partitions may impose CPU and garbage collection burdens.  The patch on https://github.com/apache/kafka/pull/5206 illustrates one of them.

> We are considering separating the logic layer and the storage layer of Kafka
> broker like Apache Pulsar.
> 
> After the modification, a server may have several Kafka brokers and more
> topics. Those brokers all connect to a sole storage engine via RP The
> sole storage can do the load balancing work easily, and avoid creating too
> many files which hurts HDD.
> 
> Is it hard? I think replacing the stuff in `Kafka.Log` would be enough,
> right?

It would help to know what the problem is here.  If the problem is a large number of files, then maybe the simplest approach would be creating fewer files.  You don't need to introduce a new layer of servers in order to do that.  You could use something like RocksDB to store messages and indices, or create your own file format which combined together things which were previously separate.  For example, we could combine the timeindex and index files.

As I understand it, Pulsar made the decision to combine together data from multiple partitions in a single file.  Sometimes a very large number of partitions.  This is great for writing, but not so good if you want to read historical data from a single topic.

regards,
Colin