You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Liam Hodges <lh...@binghamton.edu.INVALID> on 2023/08/11 00:16:17 UTC

What are the biggest issues with Apache Kafka?

I'm working with a small team of engineers looking to contribute to the
open source tools for Apache Kafka. What is missing in the Kafka community
right now? Are there any problems an open source project could solve for
it's developers? Appreciate all feedback.

Re: What are the biggest issues with Apache Kafka?

Posted by Arpit Goyal <go...@gmail.com>.
Thanks Philip
Thanks and Regards
Arpit Goyal
8861094754


On Sat, Aug 26, 2023 at 12:03 AM Philip Nee <ph...@gmail.com> wrote:

> Hi Arpit: Here's a good starting point for Kraft-related issues:
>
> https://issues.apache.org/jira/browse/KAFKA-15401?jql=project%20%3D%20KAFKA%20AND%20component%20in%20(core%2C%20kraft)
>
> The recent Jenkin's builds have also been extremely flaky - This greatly
> impacts productivity.  This would also be an important area to work on.
> See here:
>
> https://issues.apache.org/jira/browse/KAFKA-15378?jql=project%20%3D%20KAFKA%20AND%20component%20in%20(%22system%20tests%22%2C%20%22unit%20tests%22)
>
> On Fri, Aug 25, 2023 at 11:28 AM Arpit Goyal <go...@gmail.com>
> wrote:
>
> > Hi Colin,
> > Is there a ticket open for the same ?
> >
> > On Fri, Aug 25, 2023, 22:21 Colin McCabe <cm...@apache.org> wrote:
> >
> > > If you want to get more familiar with the code base, one great way is
> to
> > > convert more integration tests to KRaft mode.
> > >
> > > :)
> > >
> > > best,
> > > Colin
> > >
> > > On Thu, Aug 10, 2023, at 17:16, Liam Hodges wrote:
> > > > I'm working with a small team of engineers looking to contribute to
> the
> > > > open source tools for Apache Kafka. What is missing in the Kafka
> > > community
> > > > right now? Are there any problems an open source project could solve
> > for
> > > > it's developers? Appreciate all feedback.
> > >
> >
>

Re: What are the biggest issues with Apache Kafka?

Posted by Philip Nee <ph...@gmail.com>.
Hi Arpit: Here's a good starting point for Kraft-related issues:
https://issues.apache.org/jira/browse/KAFKA-15401?jql=project%20%3D%20KAFKA%20AND%20component%20in%20(core%2C%20kraft)

The recent Jenkin's builds have also been extremely flaky - This greatly
impacts productivity.  This would also be an important area to work on.
See here:
https://issues.apache.org/jira/browse/KAFKA-15378?jql=project%20%3D%20KAFKA%20AND%20component%20in%20(%22system%20tests%22%2C%20%22unit%20tests%22)

On Fri, Aug 25, 2023 at 11:28 AM Arpit Goyal <go...@gmail.com>
wrote:

> Hi Colin,
> Is there a ticket open for the same ?
>
> On Fri, Aug 25, 2023, 22:21 Colin McCabe <cm...@apache.org> wrote:
>
> > If you want to get more familiar with the code base, one great way is to
> > convert more integration tests to KRaft mode.
> >
> > :)
> >
> > best,
> > Colin
> >
> > On Thu, Aug 10, 2023, at 17:16, Liam Hodges wrote:
> > > I'm working with a small team of engineers looking to contribute to the
> > > open source tools for Apache Kafka. What is missing in the Kafka
> > community
> > > right now? Are there any problems an open source project could solve
> for
> > > it's developers? Appreciate all feedback.
> >
>

Re: What are the biggest issues with Apache Kafka?

Posted by Arpit Goyal <go...@gmail.com>.
Hi Colin,
Is there a ticket open for the same ?

On Fri, Aug 25, 2023, 22:21 Colin McCabe <cm...@apache.org> wrote:

> If you want to get more familiar with the code base, one great way is to
> convert more integration tests to KRaft mode.
>
> :)
>
> best,
> Colin
>
> On Thu, Aug 10, 2023, at 17:16, Liam Hodges wrote:
> > I'm working with a small team of engineers looking to contribute to the
> > open source tools for Apache Kafka. What is missing in the Kafka
> community
> > right now? Are there any problems an open source project could solve for
> > it's developers? Appreciate all feedback.
>

Re: What are the biggest issues with Apache Kafka?

Posted by Colin McCabe <cm...@apache.org>.
If you want to get more familiar with the code base, one great way is to convert more integration tests to KRaft mode.

:)

best,
Colin

On Thu, Aug 10, 2023, at 17:16, Liam Hodges wrote:
> I'm working with a small team of engineers looking to contribute to the
> open source tools for Apache Kafka. What is missing in the Kafka community
> right now? Are there any problems an open source project could solve for
> it's developers? Appreciate all feedback.

Re: What are the biggest issues with Apache Kafka?

Posted by Philip Nee <ph...@gmail.com>.
Hey Liam,

I think apache jira would be a great place to find lower hanging fruits.
We also have lots of flaky tests to resolve.  What is you familiarity with
Kafka? If you are new, i would suggest looking at the demos and examples in
the repo, and maybe try to improve them.  There is also a lot of work being
done in kraft and tiered storage, however im not familiar with them.

P

On Thu, Aug 10, 2023 at 5:17 PM Liam Hodges <lh...@binghamton.edu.invalid>
wrote:

> I'm working with a small team of engineers looking to contribute to the
> open source tools for Apache Kafka. What is missing in the Kafka community
> right now? Are there any problems an open source project could solve for
> it's developers? Appreciate all feedback.
>

Re: What are the biggest issues with Apache Kafka?

Posted by Divij Vaidya <di...@gmail.com>.
Hey Liam

Thanks for asking this question. I have been meaning to write a post to the
community for a long time about potential open areas where newcomers can
contribute but it never made it to priority in my to-do list.

In addition to what others mentioned above, here's a couple of options to
pick from. It's not an exhaustive list and I would be able to help more if
you tell me what you folks are interested in working on (e.g. on server,
client side, streams etc.) and what is the current familiarity with Kafka
code base. I can personally provide rapid reviews for option 1 and option
3, since those are the ones I feel most passionate about, but can't promise
time commitment from my side for other options.

*Option 1: KIP-405 (Tiered Storage) related work*

We are targeting an early access [1] release for KIP-405 [2] (tiered
storage in Kafka) for the upcoming version in 3.6. There is loads of work
left to polish this feature and make it production ready. If you like, you
can help over there. You can pick up any "unassigned" ticket from
https://issues.apache.org/jira/browse/KAFKA-7739 OR pick up a ticket where
the assigned person hasn't provided an update in the last 1 month.

*Option 2: Metrics related work*

We currently use two different ways of capturing metrics on the
broker/server. Historically we started with Yammer, moved to using
KafkaMetrics starting on clients but more recently we started using
KafkaMetrics on broker too. Currently the majority of broker metrics use
Yammer (which has it's own set of problems such as we are using a 10 year
old library) but the alternative KafkaMetrics has a slow histogram [2].
Here's a recent discussion about this:
https://lists.apache.org/thread/jww851jcyjtsq010bbt81b5dgwzqrgwx and
https://lists.apache.org/thread/f5wknqhmoo5lml99np7ksocz7fyk3m0r. You will
find that on the broker, KafkaRaftMetrics uses KafkaMetrics but
QuorumControllerMetrics uses Yammer metrics.We need someone in the
community pick up unifying this so that we can start using only one
methodology moving ahead. My recommendation would be to upgrade the library
of Yammer to use the latest drop wizard library as proposed in
https://cwiki.apache.org/confluence/display/KAFKA/KIP-510%3A+Metrics+library+upgrade
but there are backwarrd compatibility problems associated with it. My
colleague Christo has done some digging in the past on this and found that
the major problem of completing KIP-510 comes from the usage of
https://github.com/xvrl/kafka/blob/01208fd218286d2cd318a891f2cb5883422283b1/core/src/main/java/kafka/metrics/FilteringJmxReporter.java
introduced in KIP-544. This functionality is no longer directly available
in Dropwizard 4.2.0.
Can you dig more into this and see if there is a way to upgrade without
impacting backward compatibility?

To summarise option 2, we have the following problems:
1. We use 10 year old version of a library for capturing yammer metrics
2. Histogram calculation in metrics is very expensive. See:
https://issues.apache.org/jira/browse/KAFKA-15192?focusedCommentId=17744169&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17744169

3. KafkaMetrics library and Yammer metrics both have downsides as captured
in https://issues.apache.org/jira/browse/KAFKA-15058,
https://issues.apache.org/jira/browse/KAFKA-15154 and

*Option 3: Zero copy over SSL*

This is more of a personal project which I am not getting time to finish
up. Today zero copy doesn't have SSL enabled in Kafka. However, there is a
path forward on newer linux kernels by using kTLS. My idea is to have Kafka
use dynamically bound openssl (>=3.0) via netty-tcnative. Openssl 3.0 and
above can be compiled with the ability to enable kTLS. Hence, it should be
possible to use Kafka + netty-tcnative + openSSL compiled with ktls flag on
the OS to enable zero-copy even for SSL workloads. I can fill you in if
this is something that you are interested in pursuing.

*Option 4: Getting rid of easy mock & power mock dependencies from Kafka*

We have been making slow and steady progress towards achieving this goal
and it is being tracked in https://issues.apache.org/jira/browse/KAFKA-7438.
But it has been slow moving either because of code reviewer bandwidth or
because of lack of folks implementing the tests. We can use your help in
bringing it across the finish line.


[1]
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes
[2]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage

--
Divij Vaidya

Divij Vaidya



On Fri, Aug 11, 2023 at 4:55 AM ziming deng <de...@gmail.com>
wrote:

> Hi Liam,
>
> The Apache Kafka project has several modules, I think you should firstly
> select a module you are interested in.
>
> For example, we are currently working on KIP-500 related features, which
> includes
> 1. KIP-856: KRaft Disk Failure Recovery,
> 2. KIP-642: Dynamic quorum reassignment,
> 3. kafka-metadata-shell.sh,
> 4. KIP-866: ZooKeeper to KRaft Migration,
> 5. KIP-858: Handle JBOD broker disk failure in KRaft
> 6. Migrtion test cases to support Kraft mode
> 7. KRaft transactions
>
> We even have the idea of implementing multi raft and using it to replace
> kakfa replica protocal. Apart from KRaft, you can also explore tired
> storage, kafka streams, kafka connect,  group coordinator, transaction
> coordinator, which are also In rapid iteration.
>
> --,
> Best,
> Ziming
>
>
> > On Aug 11, 2023, at 08:16, Liam Hodges <lh...@binghamton.edu.INVALID>
> wrote:
> >
> > I'm working with a small team of engineers looking to contribute to the
> > open source tools for Apache Kafka. What is missing in the Kafka
> community
> > right now? Are there any problems an open source project could solve for
> > it's developers? Appreciate all feedback.
>
>

Re: What are the biggest issues with Apache Kafka?

Posted by ziming deng <de...@gmail.com>.
Hi Liam,

The Apache Kafka project has several modules, I think you should firstly select a module you are interested in. 

For example, we are currently working on KIP-500 related features, which includes
1. KIP-856: KRaft Disk Failure Recovery,  
2. KIP-642: Dynamic quorum reassignment,
3. kafka-metadata-shell.sh, 
4. KIP-866: ZooKeeper to KRaft Migration, 
5. KIP-858: Handle JBOD broker disk failure in KRaft
6. Migrtion test cases to support Kraft mode
7. KRaft transactions

We even have the idea of implementing multi raft and using it to replace kakfa replica protocal. Apart from KRaft, you can also explore tired storage, kafka streams, kafka connect,  group coordinator, transaction coordinator, which are also In rapid iteration.

--,
Best,
Ziming


> On Aug 11, 2023, at 08:16, Liam Hodges <lh...@binghamton.edu.INVALID> wrote:
> 
> I'm working with a small team of engineers looking to contribute to the
> open source tools for Apache Kafka. What is missing in the Kafka community
> right now? Are there any problems an open source project could solve for
> it's developers? Appreciate all feedback.