You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Juan Garcia Losada <ju...@plexus.es> on 2017/10/19 07:24:20 UTC

Doubt about developing a kafka connect with avro

Hello to everyone,

First I would like to give thanks in advantage to everyone willing to help
my in this issue.

I have been coding a custom kafka connector that read from the twitter api
and write in a kafka topic, I have been trying to use avro coding to write
those message and i found that it was very difficult. The technologies used
were confluent docker kafka images 3.1.2 and kafka client 0.10.1.0. For
generating the avro object I used the maven plugin avro-maven-plugin, that
generates the object with the schema correctly. But, I have found that, for
creating the source record, I have to replicate the same schema in java
code creating a custom "org.apache.kafka.connect.data.Schema" and then,
inform a org.apache.kafka.connect.data.Struct object with all the
information. As a value converter class, I use the AvroConverter, and
reviewing the code, the only form to send a custom object, is sending it as
Struct object, it the, create the avro format and use it for encoding the
object. I don't understand why it can not digest my custom class created by
the avro maven plugin to serialize the object. I dont now if this is the
correct form to create a connector with avro serializer.

Could I get some feedbacks about this solution? Thanks to everyone. Xoan.

Re: Doubt about developing a kafka connect with avro

Posted by Randall Hauch <rh...@gmail.com>.
Sounds like you're using Kafka Connect and trying to write a source
connector. The Connect framework separates responsibilities:

1. Your source connector is responsible for generating the SourceRecord
objects and returning them to the Connect framework via your task's
"poll()" method.
2. The Connect framework passes each SourceRecord through the chain of
Single Message Transforms (SMTs), if any are configured for that connector.
Each SMT accepts one SourceRecord and outputs a (presumably) modified
SourceRecord.
3. The SourceRecord output from the last SMT (or the source connector, if
there are no SMTs) is the *serialized* using the configured Converter
4. The Connect framework writes the serialized record to the specified
Kafka topic.

The whole purpose of this separation of responsibilities is to make it
easier to write a source connector, since the connector can focus on
reading the external system and generating the SourceRecord objects. It
also means that somebody can use a source connector they didn't write, and
modify the generated SourceRecord objects using SMTs to, for example,
reroute them to different topics or partitions, or change the structure of
the SourceRecord's keys and values. Finally, this separation also means
that somebody can use a connector (and 0 or more SMTs) they didn't write
and control how the records are serialized to Kafka by changing the
converter.

So, your Twitter source connector should create the SourceRecord objects
representing the tweets (or whatever you're ingesting) and return them to
the Connect framework via your task's "poll()" method. This is the same no
matter how you want to serialize the records into a binary form written to
Kafka topics.

You have a number of converters to choose from. First, Connect ships with
the JSON converter, which writes out to a JSON representation. To serialize
to Avro, use Confluent's Avro Converter, or implement and use your own
converter if you so choose. There are other converters out there.

By the way, Connect defines it's own Schema framework that doesn't tie
itself to Avro, but still gives connectors nearly all of the power and
flexibility of using Avro. What it doesn't allow you to do, however, is
create Java classes for your schemas; instead, you use the Struct and
Schema objects and populate your Structs similar to how you might do it
with Avro's GenericRecords.

Now, you may decide that you want your code to do a lot more and directly
write out to Kafka. In that case, your code will have to set up and use a
producer, know how to read the Twitter firehose, know how to serialize the
records (e.g., hard-code using the Avro serializer), know how to write
those records to Kafka via the producer, and track the progress of where
you are in the firehose. If you want to share this and let other people use
it, you'd need to make it all configurable so that people can adapt the
structure of the records to their own needs, and change how they're
serializing (e.g., using Protobuf rather than Avro). If you take this
approach, you'll be replicating a lot of what Connect is *already* doing
for you.

I hope this helps explain why Connect is the way it is.

Randall



, for serializing those objects  the framework separates the responsibility
of creating the source records serializing the messages

On Thu, Oct 19, 2017 at 2:24 AM, Juan Garcia Losada <
juan.garcialosada@plexus.es> wrote:

> Hello to everyone,
>
> First I would like to give thanks in advantage to everyone willing to help
> my in this issue.
>
> I have been coding a custom kafka connector that read from the twitter api
> and write in a kafka topic, I have been trying to use avro coding to write
> those message and i found that it was very difficult. The technologies used
> were confluent docker kafka images 3.1.2 and kafka client 0.10.1.0. For
> generating the avro object I used the maven plugin avro-maven-plugin, that
> generates the object with the schema correctly. But, I have found that, for
> creating the source record, I have to replicate the same schema in java
> code creating a custom "org.apache.kafka.connect.data.Schema" and then,
> inform a org.apache.kafka.connect.data.Struct object with all the
> information. As a value converter class, I use the AvroConverter, and
> reviewing the code, the only form to send a custom object, is sending it as
> Struct object, it the, create the avro format and use it for encoding the
> object. I don't understand why it can not digest my custom class created by
> the avro maven plugin to serialize the object. I dont now if this is the
> correct form to create a connector with avro serializer.
>
> Could I get some feedbacks about this solution? Thanks to everyone. Xoan.
>