You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Enrico Olivelli <eo...@gmail.com> on 2021/02/12 07:47:53 UTC

Working on Kafka Connector

Hello everyone,
here in our Pulsar repository we have a simple Kafka Connector for Pulsar
IO composed by a Sink and a Source.
https://github.com/apache/pulsar/tree/master/pulsar-io/kafka

I have started to work on a set of enhancements to this connector in order
to make it more powerful and to better fit the needs of enterprise users.

The first patch I have submitted is about supporting Avro encoded messages
+ Confluent Schema Registry in the KafkaSource
https://github.com/apache/pulsar/pull/9448

The patch is only the first one of a bigger work that we have to do in
order to have a fully usable Connector for non-trivial use cases.

I will be happy to follow up with other patches and especially to draw a
little roadmap about the features that we want to implement and provide to
the community.

Please take a look to the patch and share your thoughts

Regards
Enrico

Re: Working on Kafka Connector

Posted by Enrico Olivelli <eo...@gmail.com>.
I have sent the second part of the patch about the Kafka Source
https://github.com/apache/pulsar/pull/10002

With this second patch we are able to support non String keys and also
we are applying Schema information to the Pulsar topic.

When the key deserializer is StringDeserializer we use the decoded key
as Pulsar key.
When the key is not StringDeserializer then we use the Pulsar KeyValue
data type, with SEPARATED key encoding .

This way on the topic we have a Schema for the key and a Schema for the value.
The key is encoded into the Pulsar key (SEPARATED) and so it is used
for routing and for compaction.

Please take a look if you are interested,
there is still much work to be done, but the results are promising

Enrico

Il giorno lun 1 mar 2021 alle ore 22:32 Enrico Olivelli
<eo...@gmail.com> ha scritto:
>
>
>
> Il Lun 1 Mar 2021, 22:20 Sijie Guo <gu...@gmail.com> ha scritto:
>>
>> Enrico - I have just reviewed the PR. I don't think you addressed your
>> comments. I still have the concern how this PR is implemented. I'd prefer
>> to keep the Kafka deserializer as simple as possible. We should keep the
>> schema cache and the logic to fetch confluent schema in the source
>> connector.
>
>
> Okay. I will update the patch accordingly.
> Thanks
>
> Enrico
>
>>
>> - Sijie
>>
>> On Mon, Mar 1, 2021 at 1:04 PM Sijie Guo <gu...@gmail.com> wrote:
>>
>> > Apologized for the delay! Reviewing it now.
>> >
>> > - Sijie
>> >
>> > On Sun, Feb 28, 2021 at 11:29 PM Enrico Olivelli <eo...@gmail.com>
>> > wrote:
>> >
>> >> Hello,
>> >> Please bear with me, I really want this work to go forward  :-)
>> >>
>> >> @Sijie, I know that you are super busy, so I would like to not put
>> >> pressure on you, and I thank you very much for your useful comments on
>> >> the PR.
>> >>
>> >> Our Pulsar community is big and it is still growing
>> >> IMHO it would be a very good thing that others in the community take a
>> >> look as well.
>> >>
>> >> The first patch is not so big work and it is hard to review.
>> >> As a general approach I prefer to send little patches, this way it is
>> >> easy to understand what's going on.
>> >>
>> >> Code is not written in the stone, and we can always make improvements.
>> >> My plan is to continue working on the Kafka connector and send more
>> >> patches until I have covered all of the use cases of my interest
>> >> (basically around enterprise features, like Schema, Multi topic...)
>> >>
>> >> I would like to work directly here within the project by sending pull
>> >> requests to the ASF repo and I am not willing to not create my own
>> >> Kafka Connector fork.
>> >> I believe this is the best approach for the community,
>> >> but I need some support from the group.
>> >>
>> >> Best regards
>> >> Enrico
>> >>
>> >>
>> >>
>> >> Il giorno gio 25 feb 2021 alle ore 08:56 Sijie Guo
>> >> <gu...@gmail.com> ha scritto:
>> >> >
>> >> > Apologized for the delay! Will review it again today or tomorrow.
>> >> >
>> >> > - Sijie
>> >> >
>> >> > On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eo...@gmail.com>
>> >> wrote:
>> >> >
>> >> > > Hello community,
>> >> > > It looks like only Sijie started to review this work.
>> >> > > https://github.com/apache/pulsar/pull/9448
>> >> > >
>> >> > > I wonder if others that are interested in Kafka compatibility may
>> >> have
>> >> > > time to check it out
>> >> > >
>> >> > > As said, this is only the first part of a series of implementations
>> >> we want
>> >> > > to do about this Connector
>> >> > >
>> >> > > Enrico
>> >> > >
>> >> > > Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <
>> >> guosijie@gmail.com> ha
>> >> > > scritto:
>> >> > >
>> >> > > > Thanks, I will review the PR.
>> >> > > >
>> >> > > > - Sijie
>> >> > > >
>> >> > > > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <
>> >> eolivelli@gmail.com>
>> >> > > > wrote:
>> >> > > >
>> >> > > > > Sijie,
>> >> > > > >
>> >> > > > > I managed to implement Avro support In KafkaBytesSource following
>> >> your
>> >> > > > > suggestions. Thanks.
>> >> > > > >
>> >> > > > > I would like to commit this initial patch and then add support
>> >> for all
>> >> > > of
>> >> > > > > the primitive Schemas as you did in (1) and for JSON.
>> >> > > > > If you prefer I can continue to enhance this patch.
>> >> > > > >
>> >> > > > > Enrico
>> >> > > > >
>> >> > > > > (1)
>> >> > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>> >> > > > >
>> >> > > > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <
>> >> guosijie@gmail.com
>> >> > > >
>> >> > > > ha
>> >> > > > > scritto:
>> >> > > > >
>> >> > > > > > Hi Enrico,
>> >> > > > > >
>> >> > > > > > Thank you for working on this!
>> >> > > > > >
>> >> > > > > > But as I mentioned in the pull request, we should avoid using a
>> >> > > > > > one-connector-per-schema model. That model probably works with
>> >> other
>> >> > > > > > connectors that have a very limited number of schemas. If you
>> >> are
>> >> > > going
>> >> > > > > to
>> >> > > > > > implement a schema-aware Kafka connector, that model is
>> >> impossible to
>> >> > > > > > maintain, because it will introduce N * N connectors where N is
>> >> the
>> >> > > > > number
>> >> > > > > > of supported schemas.
>> >> > > > > >
>> >> > > > > > We should maintain one "bytes" connector and transfer the Kafka
>> >> > > schema
>> >> > > > to
>> >> > > > > > the Pulsar schema. I have written an enhanced Kafka connector
>> >> > > > > > <https://github.com/streamnative/pulsar-io-kafka> two years
>> >> ago.
>> >> > > > > >
>> >> > > > > > You just need to maintain one connector:
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
>> >> > > > > > Then convert Kafka SerDe to Pulsar schema:
>> >> > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>> >> > > > > >
>> >> > > > > > I am happy to submit a PR to merge those changes back.
>> >> > > > > >
>> >> > > > > > - Sijie
>> >> > > > > >
>> >> > > > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
>> >> > > eolivelli@gmail.com>
>> >> > > > > > wrote:
>> >> > > > > >
>> >> > > > > > > Hello everyone,
>> >> > > > > > > here in our Pulsar repository we have a simple Kafka
>> >> Connector for
>> >> > > > > Pulsar
>> >> > > > > > > IO composed by a Sink and a Source.
>> >> > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
>> >> > > > > > >
>> >> > > > > > > I have started to work on a set of enhancements to this
>> >> connector
>> >> > > in
>> >> > > > > > order
>> >> > > > > > > to make it more powerful and to better fit the needs of
>> >> enterprise
>> >> > > > > users.
>> >> > > > > > >
>> >> > > > > > > The first patch I have submitted is about supporting Avro
>> >> encoded
>> >> > > > > > messages
>> >> > > > > > > + Confluent Schema Registry in the KafkaSource
>> >> > > > > > > https://github.com/apache/pulsar/pull/9448
>> >> > > > > > >
>> >> > > > > > > The patch is only the first one of a bigger work that we have
>> >> to do
>> >> > > > in
>> >> > > > > > > order to have a fully usable Connector for non-trivial use
>> >> cases.
>> >> > > > > > >
>> >> > > > > > > I will be happy to follow up with other patches and
>> >> especially to
>> >> > > > draw
>> >> > > > > a
>> >> > > > > > > little roadmap about the features that we want to implement
>> >> and
>> >> > > > provide
>> >> > > > > > to
>> >> > > > > > > the community.
>> >> > > > > > >
>> >> > > > > > > Please take a look to the patch and share your thoughts
>> >> > > > > > >
>> >> > > > > > > Regards
>> >> > > > > > > Enrico
>> >> > > > > > >
>> >> > > > > >
>> >> > > > >
>> >> > > >
>> >> > >
>> >>
>> >

Re: Working on Kafka Connector

Posted by Enrico Olivelli <eo...@gmail.com>.
Il Lun 1 Mar 2021, 22:20 Sijie Guo <gu...@gmail.com> ha scritto:

> Enrico - I have just reviewed the PR. I don't think you addressed your
> comments. I still have the concern how this PR is implemented. I'd prefer
> to keep the Kafka deserializer as simple as possible. We should keep the
> schema cache and the logic to fetch confluent schema in the source
> connector.
>

Okay. I will update the patch accordingly.
Thanks

Enrico


> - Sijie
>
> On Mon, Mar 1, 2021 at 1:04 PM Sijie Guo <gu...@gmail.com> wrote:
>
> > Apologized for the delay! Reviewing it now.
> >
> > - Sijie
> >
> > On Sun, Feb 28, 2021 at 11:29 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> >> Hello,
> >> Please bear with me, I really want this work to go forward  :-)
> >>
> >> @Sijie, I know that you are super busy, so I would like to not put
> >> pressure on you, and I thank you very much for your useful comments on
> >> the PR.
> >>
> >> Our Pulsar community is big and it is still growing
> >> IMHO it would be a very good thing that others in the community take a
> >> look as well.
> >>
> >> The first patch is not so big work and it is hard to review.
> >> As a general approach I prefer to send little patches, this way it is
> >> easy to understand what's going on.
> >>
> >> Code is not written in the stone, and we can always make improvements.
> >> My plan is to continue working on the Kafka connector and send more
> >> patches until I have covered all of the use cases of my interest
> >> (basically around enterprise features, like Schema, Multi topic...)
> >>
> >> I would like to work directly here within the project by sending pull
> >> requests to the ASF repo and I am not willing to not create my own
> >> Kafka Connector fork.
> >> I believe this is the best approach for the community,
> >> but I need some support from the group.
> >>
> >> Best regards
> >> Enrico
> >>
> >>
> >>
> >> Il giorno gio 25 feb 2021 alle ore 08:56 Sijie Guo
> >> <gu...@gmail.com> ha scritto:
> >> >
> >> > Apologized for the delay! Will review it again today or tomorrow.
> >> >
> >> > - Sijie
> >> >
> >> > On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eo...@gmail.com>
> >> wrote:
> >> >
> >> > > Hello community,
> >> > > It looks like only Sijie started to review this work.
> >> > > https://github.com/apache/pulsar/pull/9448
> >> > >
> >> > > I wonder if others that are interested in Kafka compatibility may
> >> have
> >> > > time to check it out
> >> > >
> >> > > As said, this is only the first part of a series of implementations
> >> we want
> >> > > to do about this Connector
> >> > >
> >> > > Enrico
> >> > >
> >> > > Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <
> >> guosijie@gmail.com> ha
> >> > > scritto:
> >> > >
> >> > > > Thanks, I will review the PR.
> >> > > >
> >> > > > - Sijie
> >> > > >
> >> > > > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <
> >> eolivelli@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > Sijie,
> >> > > > >
> >> > > > > I managed to implement Avro support In KafkaBytesSource
> following
> >> your
> >> > > > > suggestions. Thanks.
> >> > > > >
> >> > > > > I would like to commit this initial patch and then add support
> >> for all
> >> > > of
> >> > > > > the primitive Schemas as you did in (1) and for JSON.
> >> > > > > If you prefer I can continue to enhance this patch.
> >> > > > >
> >> > > > > Enrico
> >> > > > >
> >> > > > > (1)
> >> > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> >> > > > >
> >> > > > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <
> >> guosijie@gmail.com
> >> > > >
> >> > > > ha
> >> > > > > scritto:
> >> > > > >
> >> > > > > > Hi Enrico,
> >> > > > > >
> >> > > > > > Thank you for working on this!
> >> > > > > >
> >> > > > > > But as I mentioned in the pull request, we should avoid using
> a
> >> > > > > > one-connector-per-schema model. That model probably works with
> >> other
> >> > > > > > connectors that have a very limited number of schemas. If you
> >> are
> >> > > going
> >> > > > > to
> >> > > > > > implement a schema-aware Kafka connector, that model is
> >> impossible to
> >> > > > > > maintain, because it will introduce N * N connectors where N
> is
> >> the
> >> > > > > number
> >> > > > > > of supported schemas.
> >> > > > > >
> >> > > > > > We should maintain one "bytes" connector and transfer the
> Kafka
> >> > > schema
> >> > > > to
> >> > > > > > the Pulsar schema. I have written an enhanced Kafka connector
> >> > > > > > <https://github.com/streamnative/pulsar-io-kafka> two years
> >> ago.
> >> > > > > >
> >> > > > > > You just need to maintain one connector:
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> >> > > > > > Then convert Kafka SerDe to Pulsar schema:
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> >> > > > > >
> >> > > > > > I am happy to submit a PR to merge those changes back.
> >> > > > > >
> >> > > > > > - Sijie
> >> > > > > >
> >> > > > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
> >> > > eolivelli@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hello everyone,
> >> > > > > > > here in our Pulsar repository we have a simple Kafka
> >> Connector for
> >> > > > > Pulsar
> >> > > > > > > IO composed by a Sink and a Source.
> >> > > > > > >
> https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> >> > > > > > >
> >> > > > > > > I have started to work on a set of enhancements to this
> >> connector
> >> > > in
> >> > > > > > order
> >> > > > > > > to make it more powerful and to better fit the needs of
> >> enterprise
> >> > > > > users.
> >> > > > > > >
> >> > > > > > > The first patch I have submitted is about supporting Avro
> >> encoded
> >> > > > > > messages
> >> > > > > > > + Confluent Schema Registry in the KafkaSource
> >> > > > > > > https://github.com/apache/pulsar/pull/9448
> >> > > > > > >
> >> > > > > > > The patch is only the first one of a bigger work that we
> have
> >> to do
> >> > > > in
> >> > > > > > > order to have a fully usable Connector for non-trivial use
> >> cases.
> >> > > > > > >
> >> > > > > > > I will be happy to follow up with other patches and
> >> especially to
> >> > > > draw
> >> > > > > a
> >> > > > > > > little roadmap about the features that we want to implement
> >> and
> >> > > > provide
> >> > > > > > to
> >> > > > > > > the community.
> >> > > > > > >
> >> > > > > > > Please take a look to the patch and share your thoughts
> >> > > > > > >
> >> > > > > > > Regards
> >> > > > > > > Enrico
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >>
> >
>

Re: Working on Kafka Connector

Posted by Sijie Guo <gu...@gmail.com>.
Enrico - I have just reviewed the PR. I don't think you addressed your
comments. I still have the concern how this PR is implemented. I'd prefer
to keep the Kafka deserializer as simple as possible. We should keep the
schema cache and the logic to fetch confluent schema in the source
connector.

- Sijie

On Mon, Mar 1, 2021 at 1:04 PM Sijie Guo <gu...@gmail.com> wrote:

> Apologized for the delay! Reviewing it now.
>
> - Sijie
>
> On Sun, Feb 28, 2021 at 11:29 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
>> Hello,
>> Please bear with me, I really want this work to go forward  :-)
>>
>> @Sijie, I know that you are super busy, so I would like to not put
>> pressure on you, and I thank you very much for your useful comments on
>> the PR.
>>
>> Our Pulsar community is big and it is still growing
>> IMHO it would be a very good thing that others in the community take a
>> look as well.
>>
>> The first patch is not so big work and it is hard to review.
>> As a general approach I prefer to send little patches, this way it is
>> easy to understand what's going on.
>>
>> Code is not written in the stone, and we can always make improvements.
>> My plan is to continue working on the Kafka connector and send more
>> patches until I have covered all of the use cases of my interest
>> (basically around enterprise features, like Schema, Multi topic...)
>>
>> I would like to work directly here within the project by sending pull
>> requests to the ASF repo and I am not willing to not create my own
>> Kafka Connector fork.
>> I believe this is the best approach for the community,
>> but I need some support from the group.
>>
>> Best regards
>> Enrico
>>
>>
>>
>> Il giorno gio 25 feb 2021 alle ore 08:56 Sijie Guo
>> <gu...@gmail.com> ha scritto:
>> >
>> > Apologized for the delay! Will review it again today or tomorrow.
>> >
>> > - Sijie
>> >
>> > On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eo...@gmail.com>
>> wrote:
>> >
>> > > Hello community,
>> > > It looks like only Sijie started to review this work.
>> > > https://github.com/apache/pulsar/pull/9448
>> > >
>> > > I wonder if others that are interested in Kafka compatibility may
>> have
>> > > time to check it out
>> > >
>> > > As said, this is only the first part of a series of implementations
>> we want
>> > > to do about this Connector
>> > >
>> > > Enrico
>> > >
>> > > Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <
>> guosijie@gmail.com> ha
>> > > scritto:
>> > >
>> > > > Thanks, I will review the PR.
>> > > >
>> > > > - Sijie
>> > > >
>> > > > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <
>> eolivelli@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > Sijie,
>> > > > >
>> > > > > I managed to implement Avro support In KafkaBytesSource following
>> your
>> > > > > suggestions. Thanks.
>> > > > >
>> > > > > I would like to commit this initial patch and then add support
>> for all
>> > > of
>> > > > > the primitive Schemas as you did in (1) and for JSON.
>> > > > > If you prefer I can continue to enhance this patch.
>> > > > >
>> > > > > Enrico
>> > > > >
>> > > > > (1)
>> > > > >
>> > > > >
>> > > >
>> > >
>> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>> > > > >
>> > > > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <
>> guosijie@gmail.com
>> > > >
>> > > > ha
>> > > > > scritto:
>> > > > >
>> > > > > > Hi Enrico,
>> > > > > >
>> > > > > > Thank you for working on this!
>> > > > > >
>> > > > > > But as I mentioned in the pull request, we should avoid using a
>> > > > > > one-connector-per-schema model. That model probably works with
>> other
>> > > > > > connectors that have a very limited number of schemas. If you
>> are
>> > > going
>> > > > > to
>> > > > > > implement a schema-aware Kafka connector, that model is
>> impossible to
>> > > > > > maintain, because it will introduce N * N connectors where N is
>> the
>> > > > > number
>> > > > > > of supported schemas.
>> > > > > >
>> > > > > > We should maintain one "bytes" connector and transfer the Kafka
>> > > schema
>> > > > to
>> > > > > > the Pulsar schema. I have written an enhanced Kafka connector
>> > > > > > <https://github.com/streamnative/pulsar-io-kafka> two years
>> ago.
>> > > > > >
>> > > > > > You just need to maintain one connector:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
>> > > > > > Then convert Kafka SerDe to Pulsar schema:
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>> > > > > >
>> > > > > > I am happy to submit a PR to merge those changes back.
>> > > > > >
>> > > > > > - Sijie
>> > > > > >
>> > > > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
>> > > eolivelli@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hello everyone,
>> > > > > > > here in our Pulsar repository we have a simple Kafka
>> Connector for
>> > > > > Pulsar
>> > > > > > > IO composed by a Sink and a Source.
>> > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
>> > > > > > >
>> > > > > > > I have started to work on a set of enhancements to this
>> connector
>> > > in
>> > > > > > order
>> > > > > > > to make it more powerful and to better fit the needs of
>> enterprise
>> > > > > users.
>> > > > > > >
>> > > > > > > The first patch I have submitted is about supporting Avro
>> encoded
>> > > > > > messages
>> > > > > > > + Confluent Schema Registry in the KafkaSource
>> > > > > > > https://github.com/apache/pulsar/pull/9448
>> > > > > > >
>> > > > > > > The patch is only the first one of a bigger work that we have
>> to do
>> > > > in
>> > > > > > > order to have a fully usable Connector for non-trivial use
>> cases.
>> > > > > > >
>> > > > > > > I will be happy to follow up with other patches and
>> especially to
>> > > > draw
>> > > > > a
>> > > > > > > little roadmap about the features that we want to implement
>> and
>> > > > provide
>> > > > > > to
>> > > > > > > the community.
>> > > > > > >
>> > > > > > > Please take a look to the patch and share your thoughts
>> > > > > > >
>> > > > > > > Regards
>> > > > > > > Enrico
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Re: Working on Kafka Connector

Posted by Sijie Guo <gu...@gmail.com>.
Apologized for the delay! Reviewing it now.

- Sijie

On Sun, Feb 28, 2021 at 11:29 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> Hello,
> Please bear with me, I really want this work to go forward  :-)
>
> @Sijie, I know that you are super busy, so I would like to not put
> pressure on you, and I thank you very much for your useful comments on
> the PR.
>
> Our Pulsar community is big and it is still growing
> IMHO it would be a very good thing that others in the community take a
> look as well.
>
> The first patch is not so big work and it is hard to review.
> As a general approach I prefer to send little patches, this way it is
> easy to understand what's going on.
>
> Code is not written in the stone, and we can always make improvements.
> My plan is to continue working on the Kafka connector and send more
> patches until I have covered all of the use cases of my interest
> (basically around enterprise features, like Schema, Multi topic...)
>
> I would like to work directly here within the project by sending pull
> requests to the ASF repo and I am not willing to not create my own
> Kafka Connector fork.
> I believe this is the best approach for the community,
> but I need some support from the group.
>
> Best regards
> Enrico
>
>
>
> Il giorno gio 25 feb 2021 alle ore 08:56 Sijie Guo
> <gu...@gmail.com> ha scritto:
> >
> > Apologized for the delay! Will review it again today or tomorrow.
> >
> > - Sijie
> >
> > On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
> >
> > > Hello community,
> > > It looks like only Sijie started to review this work.
> > > https://github.com/apache/pulsar/pull/9448
> > >
> > > I wonder if others that are interested in Kafka compatibility may  have
> > > time to check it out
> > >
> > > As said, this is only the first part of a series of implementations we
> want
> > > to do about this Connector
> > >
> > > Enrico
> > >
> > > Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <gu...@gmail.com>
> ha
> > > scritto:
> > >
> > > > Thanks, I will review the PR.
> > > >
> > > > - Sijie
> > > >
> > > > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <eolivelli@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Sijie,
> > > > >
> > > > > I managed to implement Avro support In KafkaBytesSource following
> your
> > > > > suggestions. Thanks.
> > > > >
> > > > > I would like to commit this initial patch and then add support for
> all
> > > of
> > > > > the primitive Schemas as you did in (1) and for JSON.
> > > > > If you prefer I can continue to enhance this patch.
> > > > >
> > > > > Enrico
> > > > >
> > > > > (1)
> > > > >
> > > > >
> > > >
> > >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > > > >
> > > > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <
> guosijie@gmail.com
> > > >
> > > > ha
> > > > > scritto:
> > > > >
> > > > > > Hi Enrico,
> > > > > >
> > > > > > Thank you for working on this!
> > > > > >
> > > > > > But as I mentioned in the pull request, we should avoid using a
> > > > > > one-connector-per-schema model. That model probably works with
> other
> > > > > > connectors that have a very limited number of schemas. If you are
> > > going
> > > > > to
> > > > > > implement a schema-aware Kafka connector, that model is
> impossible to
> > > > > > maintain, because it will introduce N * N connectors where N is
> the
> > > > > number
> > > > > > of supported schemas.
> > > > > >
> > > > > > We should maintain one "bytes" connector and transfer the Kafka
> > > schema
> > > > to
> > > > > > the Pulsar schema. I have written an enhanced Kafka connector
> > > > > > <https://github.com/streamnative/pulsar-io-kafka> two years ago.
> > > > > >
> > > > > > You just need to maintain one connector:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> > > > > > Then convert Kafka SerDe to Pulsar schema:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > > > > >
> > > > > > I am happy to submit a PR to merge those changes back.
> > > > > >
> > > > > > - Sijie
> > > > > >
> > > > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
> > > eolivelli@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hello everyone,
> > > > > > > here in our Pulsar repository we have a simple Kafka Connector
> for
> > > > > Pulsar
> > > > > > > IO composed by a Sink and a Source.
> > > > > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> > > > > > >
> > > > > > > I have started to work on a set of enhancements to this
> connector
> > > in
> > > > > > order
> > > > > > > to make it more powerful and to better fit the needs of
> enterprise
> > > > > users.
> > > > > > >
> > > > > > > The first patch I have submitted is about supporting Avro
> encoded
> > > > > > messages
> > > > > > > + Confluent Schema Registry in the KafkaSource
> > > > > > > https://github.com/apache/pulsar/pull/9448
> > > > > > >
> > > > > > > The patch is only the first one of a bigger work that we have
> to do
> > > > in
> > > > > > > order to have a fully usable Connector for non-trivial use
> cases.
> > > > > > >
> > > > > > > I will be happy to follow up with other patches and especially
> to
> > > > draw
> > > > > a
> > > > > > > little roadmap about the features that we want to implement and
> > > > provide
> > > > > > to
> > > > > > > the community.
> > > > > > >
> > > > > > > Please take a look to the patch and share your thoughts
> > > > > > >
> > > > > > > Regards
> > > > > > > Enrico
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: Working on Kafka Connector

Posted by Enrico Olivelli <eo...@gmail.com>.
Hello,
Please bear with me, I really want this work to go forward  :-)

@Sijie, I know that you are super busy, so I would like to not put
pressure on you, and I thank you very much for your useful comments on
the PR.

Our Pulsar community is big and it is still growing
IMHO it would be a very good thing that others in the community take a
look as well.

The first patch is not so big work and it is hard to review.
As a general approach I prefer to send little patches, this way it is
easy to understand what's going on.

Code is not written in the stone, and we can always make improvements.
My plan is to continue working on the Kafka connector and send more
patches until I have covered all of the use cases of my interest
(basically around enterprise features, like Schema, Multi topic...)

I would like to work directly here within the project by sending pull
requests to the ASF repo and I am not willing to not create my own
Kafka Connector fork.
I believe this is the best approach for the community,
but I need some support from the group.

Best regards
Enrico



Il giorno gio 25 feb 2021 alle ore 08:56 Sijie Guo
<gu...@gmail.com> ha scritto:
>
> Apologized for the delay! Will review it again today or tomorrow.
>
> - Sijie
>
> On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eo...@gmail.com> wrote:
>
> > Hello community,
> > It looks like only Sijie started to review this work.
> > https://github.com/apache/pulsar/pull/9448
> >
> > I wonder if others that are interested in Kafka compatibility may  have
> > time to check it out
> >
> > As said, this is only the first part of a series of implementations we want
> > to do about this Connector
> >
> > Enrico
> >
> > Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <gu...@gmail.com> ha
> > scritto:
> >
> > > Thanks, I will review the PR.
> > >
> > > - Sijie
> > >
> > > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Sijie,
> > > >
> > > > I managed to implement Avro support In KafkaBytesSource following your
> > > > suggestions. Thanks.
> > > >
> > > > I would like to commit this initial patch and then add support for all
> > of
> > > > the primitive Schemas as you did in (1) and for JSON.
> > > > If you prefer I can continue to enhance this patch.
> > > >
> > > > Enrico
> > > >
> > > > (1)
> > > >
> > > >
> > >
> > https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > > >
> > > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <guosijie@gmail.com
> > >
> > > ha
> > > > scritto:
> > > >
> > > > > Hi Enrico,
> > > > >
> > > > > Thank you for working on this!
> > > > >
> > > > > But as I mentioned in the pull request, we should avoid using a
> > > > > one-connector-per-schema model. That model probably works with other
> > > > > connectors that have a very limited number of schemas. If you are
> > going
> > > > to
> > > > > implement a schema-aware Kafka connector, that model is impossible to
> > > > > maintain, because it will introduce N * N connectors where N is the
> > > > number
> > > > > of supported schemas.
> > > > >
> > > > > We should maintain one "bytes" connector and transfer the Kafka
> > schema
> > > to
> > > > > the Pulsar schema. I have written an enhanced Kafka connector
> > > > > <https://github.com/streamnative/pulsar-io-kafka> two years ago.
> > > > >
> > > > > You just need to maintain one connector:
> > > > >
> > > > >
> > > >
> > >
> > https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> > > > > Then convert Kafka SerDe to Pulsar schema:
> > > > >
> > > > >
> > > >
> > >
> > https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > > > >
> > > > > I am happy to submit a PR to merge those changes back.
> > > > >
> > > > > - Sijie
> > > > >
> > > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
> > eolivelli@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hello everyone,
> > > > > > here in our Pulsar repository we have a simple Kafka Connector for
> > > > Pulsar
> > > > > > IO composed by a Sink and a Source.
> > > > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> > > > > >
> > > > > > I have started to work on a set of enhancements to this connector
> > in
> > > > > order
> > > > > > to make it more powerful and to better fit the needs of enterprise
> > > > users.
> > > > > >
> > > > > > The first patch I have submitted is about supporting Avro encoded
> > > > > messages
> > > > > > + Confluent Schema Registry in the KafkaSource
> > > > > > https://github.com/apache/pulsar/pull/9448
> > > > > >
> > > > > > The patch is only the first one of a bigger work that we have to do
> > > in
> > > > > > order to have a fully usable Connector for non-trivial use cases.
> > > > > >
> > > > > > I will be happy to follow up with other patches and especially to
> > > draw
> > > > a
> > > > > > little roadmap about the features that we want to implement and
> > > provide
> > > > > to
> > > > > > the community.
> > > > > >
> > > > > > Please take a look to the patch and share your thoughts
> > > > > >
> > > > > > Regards
> > > > > > Enrico
> > > > > >
> > > > >
> > > >
> > >
> >

Re: Working on Kafka Connector

Posted by Sijie Guo <gu...@gmail.com>.
Apologized for the delay! Will review it again today or tomorrow.

- Sijie

On Wed, Feb 24, 2021 at 3:49 AM Enrico Olivelli <eo...@gmail.com> wrote:

> Hello community,
> It looks like only Sijie started to review this work.
> https://github.com/apache/pulsar/pull/9448
>
> I wonder if others that are interested in Kafka compatibility may  have
> time to check it out
>
> As said, this is only the first part of a series of implementations we want
> to do about this Connector
>
> Enrico
>
> Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <gu...@gmail.com> ha
> scritto:
>
> > Thanks, I will review the PR.
> >
> > - Sijie
> >
> > On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Sijie,
> > >
> > > I managed to implement Avro support In KafkaBytesSource following your
> > > suggestions. Thanks.
> > >
> > > I would like to commit this initial patch and then add support for all
> of
> > > the primitive Schemas as you did in (1) and for JSON.
> > > If you prefer I can continue to enhance this patch.
> > >
> > > Enrico
> > >
> > > (1)
> > >
> > >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > >
> > > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <guosijie@gmail.com
> >
> > ha
> > > scritto:
> > >
> > > > Hi Enrico,
> > > >
> > > > Thank you for working on this!
> > > >
> > > > But as I mentioned in the pull request, we should avoid using a
> > > > one-connector-per-schema model. That model probably works with other
> > > > connectors that have a very limited number of schemas. If you are
> going
> > > to
> > > > implement a schema-aware Kafka connector, that model is impossible to
> > > > maintain, because it will introduce N * N connectors where N is the
> > > number
> > > > of supported schemas.
> > > >
> > > > We should maintain one "bytes" connector and transfer the Kafka
> schema
> > to
> > > > the Pulsar schema. I have written an enhanced Kafka connector
> > > > <https://github.com/streamnative/pulsar-io-kafka> two years ago.
> > > >
> > > > You just need to maintain one connector:
> > > >
> > > >
> > >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> > > > Then convert Kafka SerDe to Pulsar schema:
> > > >
> > > >
> > >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > > >
> > > > I am happy to submit a PR to merge those changes back.
> > > >
> > > > - Sijie
> > > >
> > > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <
> eolivelli@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello everyone,
> > > > > here in our Pulsar repository we have a simple Kafka Connector for
> > > Pulsar
> > > > > IO composed by a Sink and a Source.
> > > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> > > > >
> > > > > I have started to work on a set of enhancements to this connector
> in
> > > > order
> > > > > to make it more powerful and to better fit the needs of enterprise
> > > users.
> > > > >
> > > > > The first patch I have submitted is about supporting Avro encoded
> > > > messages
> > > > > + Confluent Schema Registry in the KafkaSource
> > > > > https://github.com/apache/pulsar/pull/9448
> > > > >
> > > > > The patch is only the first one of a bigger work that we have to do
> > in
> > > > > order to have a fully usable Connector for non-trivial use cases.
> > > > >
> > > > > I will be happy to follow up with other patches and especially to
> > draw
> > > a
> > > > > little roadmap about the features that we want to implement and
> > provide
> > > > to
> > > > > the community.
> > > > >
> > > > > Please take a look to the patch and share your thoughts
> > > > >
> > > > > Regards
> > > > > Enrico
> > > > >
> > > >
> > >
> >
>

Re: Working on Kafka Connector

Posted by Enrico Olivelli <eo...@gmail.com>.
Hello community,
It looks like only Sijie started to review this work.
https://github.com/apache/pulsar/pull/9448

I wonder if others that are interested in Kafka compatibility may  have
time to check it out

As said, this is only the first part of a series of implementations we want
to do about this Connector

Enrico

Il giorno mar 16 feb 2021 alle ore 05:31 Sijie Guo <gu...@gmail.com> ha
scritto:

> Thanks, I will review the PR.
>
> - Sijie
>
> On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Sijie,
> >
> > I managed to implement Avro support In KafkaBytesSource following your
> > suggestions. Thanks.
> >
> > I would like to commit this initial patch and then add support for all of
> > the primitive Schemas as you did in (1) and for JSON.
> > If you prefer I can continue to enhance this patch.
> >
> > Enrico
> >
> > (1)
> >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> >
> > Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <gu...@gmail.com>
> ha
> > scritto:
> >
> > > Hi Enrico,
> > >
> > > Thank you for working on this!
> > >
> > > But as I mentioned in the pull request, we should avoid using a
> > > one-connector-per-schema model. That model probably works with other
> > > connectors that have a very limited number of schemas. If you are going
> > to
> > > implement a schema-aware Kafka connector, that model is impossible to
> > > maintain, because it will introduce N * N connectors where N is the
> > number
> > > of supported schemas.
> > >
> > > We should maintain one "bytes" connector and transfer the Kafka schema
> to
> > > the Pulsar schema. I have written an enhanced Kafka connector
> > > <https://github.com/streamnative/pulsar-io-kafka> two years ago.
> > >
> > > You just need to maintain one connector:
> > >
> > >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> > > Then convert Kafka SerDe to Pulsar schema:
> > >
> > >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> > >
> > > I am happy to submit a PR to merge those changes back.
> > >
> > > - Sijie
> > >
> > > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <eo...@gmail.com>
> > > wrote:
> > >
> > > > Hello everyone,
> > > > here in our Pulsar repository we have a simple Kafka Connector for
> > Pulsar
> > > > IO composed by a Sink and a Source.
> > > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> > > >
> > > > I have started to work on a set of enhancements to this connector in
> > > order
> > > > to make it more powerful and to better fit the needs of enterprise
> > users.
> > > >
> > > > The first patch I have submitted is about supporting Avro encoded
> > > messages
> > > > + Confluent Schema Registry in the KafkaSource
> > > > https://github.com/apache/pulsar/pull/9448
> > > >
> > > > The patch is only the first one of a bigger work that we have to do
> in
> > > > order to have a fully usable Connector for non-trivial use cases.
> > > >
> > > > I will be happy to follow up with other patches and especially to
> draw
> > a
> > > > little roadmap about the features that we want to implement and
> provide
> > > to
> > > > the community.
> > > >
> > > > Please take a look to the patch and share your thoughts
> > > >
> > > > Regards
> > > > Enrico
> > > >
> > >
> >
>

Re: Working on Kafka Connector

Posted by Sijie Guo <gu...@gmail.com>.
Thanks, I will review the PR.

- Sijie

On Mon, Feb 15, 2021 at 2:47 AM Enrico Olivelli <eo...@gmail.com> wrote:

> Sijie,
>
> I managed to implement Avro support In KafkaBytesSource following your
> suggestions. Thanks.
>
> I would like to commit this initial patch and then add support for all of
> the primitive Schemas as you did in (1) and for JSON.
> If you prefer I can continue to enhance this patch.
>
> Enrico
>
> (1)
>
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>
> Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <gu...@gmail.com> ha
> scritto:
>
> > Hi Enrico,
> >
> > Thank you for working on this!
> >
> > But as I mentioned in the pull request, we should avoid using a
> > one-connector-per-schema model. That model probably works with other
> > connectors that have a very limited number of schemas. If you are going
> to
> > implement a schema-aware Kafka connector, that model is impossible to
> > maintain, because it will introduce N * N connectors where N is the
> number
> > of supported schemas.
> >
> > We should maintain one "bytes" connector and transfer the Kafka schema to
> > the Pulsar schema. I have written an enhanced Kafka connector
> > <https://github.com/streamnative/pulsar-io-kafka> two years ago.
> >
> > You just need to maintain one connector:
> >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> > Then convert Kafka SerDe to Pulsar schema:
> >
> >
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
> >
> > I am happy to submit a PR to merge those changes back.
> >
> > - Sijie
> >
> > On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <eo...@gmail.com>
> > wrote:
> >
> > > Hello everyone,
> > > here in our Pulsar repository we have a simple Kafka Connector for
> Pulsar
> > > IO composed by a Sink and a Source.
> > > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> > >
> > > I have started to work on a set of enhancements to this connector in
> > order
> > > to make it more powerful and to better fit the needs of enterprise
> users.
> > >
> > > The first patch I have submitted is about supporting Avro encoded
> > messages
> > > + Confluent Schema Registry in the KafkaSource
> > > https://github.com/apache/pulsar/pull/9448
> > >
> > > The patch is only the first one of a bigger work that we have to do in
> > > order to have a fully usable Connector for non-trivial use cases.
> > >
> > > I will be happy to follow up with other patches and especially to draw
> a
> > > little roadmap about the features that we want to implement and provide
> > to
> > > the community.
> > >
> > > Please take a look to the patch and share your thoughts
> > >
> > > Regards
> > > Enrico
> > >
> >
>

Re: Working on Kafka Connector

Posted by Enrico Olivelli <eo...@gmail.com>.
Sijie,

I managed to implement Avro support In KafkaBytesSource following your
suggestions. Thanks.

I would like to commit this initial patch and then add support for all of
the primitive Schemas as you did in (1) and for JSON.
If you prefer I can continue to enhance this patch.

Enrico

(1)
https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338

Il giorno lun 15 feb 2021 alle ore 06:01 Sijie Guo <gu...@gmail.com> ha
scritto:

> Hi Enrico,
>
> Thank you for working on this!
>
> But as I mentioned in the pull request, we should avoid using a
> one-connector-per-schema model. That model probably works with other
> connectors that have a very limited number of schemas. If you are going to
> implement a schema-aware Kafka connector, that model is impossible to
> maintain, because it will introduce N * N connectors where N is the number
> of supported schemas.
>
> We should maintain one "bytes" connector and transfer the Kafka schema to
> the Pulsar schema. I have written an enhanced Kafka connector
> <https://github.com/streamnative/pulsar-io-kafka> two years ago.
>
> You just need to maintain one connector:
>
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
> Then convert Kafka SerDe to Pulsar schema:
>
> https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338
>
> I am happy to submit a PR to merge those changes back.
>
> - Sijie
>
> On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <eo...@gmail.com>
> wrote:
>
> > Hello everyone,
> > here in our Pulsar repository we have a simple Kafka Connector for Pulsar
> > IO composed by a Sink and a Source.
> > https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
> >
> > I have started to work on a set of enhancements to this connector in
> order
> > to make it more powerful and to better fit the needs of enterprise users.
> >
> > The first patch I have submitted is about supporting Avro encoded
> messages
> > + Confluent Schema Registry in the KafkaSource
> > https://github.com/apache/pulsar/pull/9448
> >
> > The patch is only the first one of a bigger work that we have to do in
> > order to have a fully usable Connector for non-trivial use cases.
> >
> > I will be happy to follow up with other patches and especially to draw a
> > little roadmap about the features that we want to implement and provide
> to
> > the community.
> >
> > Please take a look to the patch and share your thoughts
> >
> > Regards
> > Enrico
> >
>

Re: Working on Kafka Connector

Posted by Sijie Guo <gu...@gmail.com>.
Hi Enrico,

Thank you for working on this!

But as I mentioned in the pull request, we should avoid using a
one-connector-per-schema model. That model probably works with other
connectors that have a very limited number of schemas. If you are going to
implement a schema-aware Kafka connector, that model is impossible to
maintain, because it will introduce N * N connectors where N is the number
of supported schemas.

We should maintain one "bytes" connector and transfer the Kafka schema to
the Pulsar schema. I have written an enhanced Kafka connector
<https://github.com/streamnative/pulsar-io-kafka> two years ago.

You just need to maintain one connector:
https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L94
Then convert Kafka SerDe to Pulsar schema:
https://github.com/streamnative/pulsar-io-kafka/blob/master/src/main/java/io/streamnative/connectors/kafka/KafkaSource.java#L338

I am happy to submit a PR to merge those changes back.

- Sijie

On Thu, Feb 11, 2021 at 11:48 PM Enrico Olivelli <eo...@gmail.com>
wrote:

> Hello everyone,
> here in our Pulsar repository we have a simple Kafka Connector for Pulsar
> IO composed by a Sink and a Source.
> https://github.com/apache/pulsar/tree/master/pulsar-io/kafka
>
> I have started to work on a set of enhancements to this connector in order
> to make it more powerful and to better fit the needs of enterprise users.
>
> The first patch I have submitted is about supporting Avro encoded messages
> + Confluent Schema Registry in the KafkaSource
> https://github.com/apache/pulsar/pull/9448
>
> The patch is only the first one of a bigger work that we have to do in
> order to have a fully usable Connector for non-trivial use cases.
>
> I will be happy to follow up with other patches and especially to draw a
> little roadmap about the features that we want to implement and provide to
> the community.
>
> Please take a look to the patch and share your thoughts
>
> Regards
> Enrico
>