You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Uber Slacker <cu...@gmail.com> on 2016/04/06 17:56:08 UTC

Kafka Connect concept question

Hi folks. I'm pretty new to Kafka. I have spent a fair amount of time so
far understanding the Kafka system in general and how producers and
consumers work. I'm now trying to get a grasp on how Kafka Connect
compares/contrasts to Producers/Consumers written via the Java API.

When might someone want to write their own Java Producer/Consumer versus
using a connector in Kafka Connect?  How does Kafka Connect use producers
and consumers behind the scenes? Why wouldn't we simply want a
producer/consumer library that contains producers and consumers written to
work with various external systems such as HDFS? Why this new framework?
Thanks for any clarification!

Re: Kafka Connect concept question

Posted by Jay Kreps <ja...@confluent.io>.
Another way to think about this is that the producer allows you to PUSH
data into Kafka and the consumer allows you to PULL data out. This is what
you need to write an application.

However for an existing data system you need the opposite you need to PULL
data into Kafka from the system or PUSH it out of Kafka into the system.
Kafka Connect implements this. Technically you could implement this from
scratch but then you'd just be rebuilding what Connect itself does (as Ewen
said).

Not sure if that made things more or less clear :-)

-Jay

On Thu, Apr 7, 2016 at 1:41 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> On Wed, Apr 6, 2016 at 8:56 AM, Uber Slacker <cu...@gmail.com>
> wrote:
>
> > Hi folks. I'm pretty new to Kafka. I have spent a fair amount of time so
> > far understanding the Kafka system in general and how producers and
> > consumers work. I'm now trying to get a grasp on how Kafka Connect
> > compares/contrasts to Producers/Consumers written via the Java API.
> >
> > When might someone want to write their own Java Producer/Consumer versus
> > using a connector in Kafka Connect?  How does Kafka Connect use producers
> > and consumers behind the scenes? Why wouldn't we simply want a
> > producer/consumer library that contains producers and consumers written
> to
> > work with various external systems such as HDFS? Why this new framework?
> > Thanks for any clarification!
> >
>
> Internally Connect does use the producer and consumer. However, the
> framework adds a lot of support for functionality you want specifically
> when you are copying data from another system to Kafka or from Kafka to
> another system. Connect handles distribution and fault tolerance for you at
> the framework level. It provides a schema/data API and abstracts away
> details of serialization such that you can write a single connector and
> support multiple formats.
>
> If you're trying to copy data to/from another system, we'd generally
> recommend using the Connect framework since it adds all this extra support
> and allows you to focus only on how you get the data into/out of the other
> system. You'll want to use producers and consumers directly if you need
> more control that Connect hides from you, but then you'll need to create
> your own implementation of features Connect provides (or simply not support
> them).
>
> -Ewen
>

Re: Kafka Connect concept question

Posted by Uber Slacker <cu...@gmail.com>.
Thanks for the explanations guys. It would be cool to see a section in the
documentation that explicitly compares and contrasts Kafka Connect versus
working directly with the producer and consumer APIs. That's just my
perspective as a newb - perhaps it's clear to others. Thanks again!

On Thu, Apr 7, 2016 at 4:41 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> On Wed, Apr 6, 2016 at 8:56 AM, Uber Slacker <cu...@gmail.com>
> wrote:
>
> > Hi folks. I'm pretty new to Kafka. I have spent a fair amount of time so
> > far understanding the Kafka system in general and how producers and
> > consumers work. I'm now trying to get a grasp on how Kafka Connect
> > compares/contrasts to Producers/Consumers written via the Java API.
> >
> > When might someone want to write their own Java Producer/Consumer versus
> > using a connector in Kafka Connect?  How does Kafka Connect use producers
> > and consumers behind the scenes? Why wouldn't we simply want a
> > producer/consumer library that contains producers and consumers written
> to
> > work with various external systems such as HDFS? Why this new framework?
> > Thanks for any clarification!
> >
>
> Internally Connect does use the producer and consumer. However, the
> framework adds a lot of support for functionality you want specifically
> when you are copying data from another system to Kafka or from Kafka to
> another system. Connect handles distribution and fault tolerance for you at
> the framework level. It provides a schema/data API and abstracts away
> details of serialization such that you can write a single connector and
> support multiple formats.
>
> If you're trying to copy data to/from another system, we'd generally
> recommend using the Connect framework since it adds all this extra support
> and allows you to focus only on how you get the data into/out of the other
> system. You'll want to use producers and consumers directly if you need
> more control that Connect hides from you, but then you'll need to create
> your own implementation of features Connect provides (or simply not support
> them).
>
> -Ewen
>

Re: Kafka Connect concept question

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
On Wed, Apr 6, 2016 at 8:56 AM, Uber Slacker <cu...@gmail.com> wrote:

> Hi folks. I'm pretty new to Kafka. I have spent a fair amount of time so
> far understanding the Kafka system in general and how producers and
> consumers work. I'm now trying to get a grasp on how Kafka Connect
> compares/contrasts to Producers/Consumers written via the Java API.
>
> When might someone want to write their own Java Producer/Consumer versus
> using a connector in Kafka Connect?  How does Kafka Connect use producers
> and consumers behind the scenes? Why wouldn't we simply want a
> producer/consumer library that contains producers and consumers written to
> work with various external systems such as HDFS? Why this new framework?
> Thanks for any clarification!
>

Internally Connect does use the producer and consumer. However, the
framework adds a lot of support for functionality you want specifically
when you are copying data from another system to Kafka or from Kafka to
another system. Connect handles distribution and fault tolerance for you at
the framework level. It provides a schema/data API and abstracts away
details of serialization such that you can write a single connector and
support multiple formats.

If you're trying to copy data to/from another system, we'd generally
recommend using the Connect framework since it adds all this extra support
and allows you to focus only on how you get the data into/out of the other
system. You'll want to use producers and consumers directly if you need
more control that Connect hides from you, but then you'll need to create
your own implementation of features Connect provides (or simply not support
them).

-Ewen