You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Tauzell, Dave" <Da...@surescripts.com> on 2016/06/14 15:08:25 UTC

Kafka Connect HdfsSink and the Schema Registry

I have been able to get my C# client to put avro records to a Kafka topic and have the HdfsSink read and save them in files.  I am confused about interaction with the registry.  The kafka message contains a schema id an I see the connector look that up in the registry.  Then it also looks up a subject which is <topic>-value.

What is the relationship between the passed schema id and the subject which is derived from the topic name?

-Dave

This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

Re: Kafka Connect HdfsSink and the Schema Registry

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
Great, glad you sorted it out. If the namespace is being omitted
incorrectly from the request the connector is making, please file a bug
report -- I can't think of a reason we'd omit that, but it's certainly
possible it is a bug on our side.

-Ewen

On Wed, Jun 15, 2016 at 7:08 AM, Tauzell, Dave <Dave.Tauzell@surescripts.com
> wrote:

> Thanks Ewan,
>
> The second request was made by me directly.  I'm trying to add this
> functionality into my .Net application.  The library I'm using doesn't have
> an implementation of the AvroSeriazlizer that interacts with the schema
> registry.  I've now added in code to make to POST to
> /subjects/<topic>-value with the schema.   Something I noticed is that I
> was using schema like this:
>
> {
>   "subject": "AuditHdfsTest5-value",
>   "version": 1,
>   "id": 5,
>   "schema":
> "{\"type\":\"record\",\"name\":\"GenericAuditRecord\",\"namespace\":\"audit\",\"fields\":[{\"name\":\"xml\",\"type\":[\"string\",\"null\"]}]}"
> }
>
> When the connector got a message and did a lookup it didn't have the
> "namespace" field and the lookup failed.  I then posted a new version of
> the schema without the "namespace" field and it worked.
>
> -Dave
>
> Dave Tauzell | Senior Software Engineer | Surescripts
> O: 651.855.3042 | www.surescripts.com |   Dave.Tauzell@surescripts.com
> Connect with us: Twitter I LinkedIn I Facebook I YouTube
>
>
> -----Original Message-----
> From: Ewen Cheslack-Postava [mailto:ewen@confluent.io]
> Sent: Tuesday, June 14, 2016 6:59 PM
> To: users@kafka.apache.org
> Subject: Re: Kafka Connect HdfsSink and the Schema Registry
>
> On Tue, Jun 14, 2016 at 8:08 AM, Tauzell, Dave <
> Dave.Tauzell@surescripts.com
> > wrote:
>
> > I have been able to get my C# client to put avro records to a Kafka
> > topic and have the HdfsSink read and save them in files.  I am
> > confused about interaction with the registry.  The kafka message
> > contains a schema id an I see the connector look that up in the
> > registry.  Then it also looks up a subject which is <topic>-value.
> >
> > What is the relationship between the passed schema id and the subject
> > which is derived from the topic name?
> >
>
> The HDFS connector doesn't work directly with the schema registry, the
> AvroConverter does. I'm not sure what the second request you're seeing is
> -- normally it would only look up the schema ID in order to get the schema.
> Where are you seeing the second request, and can you include some logs? I
> can't think of any other requests the AvroConverter would be making just
> for deserialization.
>
> The subject names are generating in the serializer as <topic>-key and
> <topic>-value and this is just the standardized approach Confluent's
> serializers use. The ID will have been registered under that subject.
>
> -Ewen
>
>
> >
> > -Dave
> >
> > This e-mail and any files transmitted with it are confidential, may
> > contain sensitive information, and are intended solely for the use of
> > the individual or entity to whom they are addressed. If you have
> > received this e-mail in error, please notify the sender by reply
> > e-mail immediately and destroy all copies of the e-mail and any
> attachments.
> >
>
>
>
> --
> Thanks,
> Ewen
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>



-- 
Thanks,
Ewen

RE: Kafka Connect HdfsSink and the Schema Registry

Posted by "Tauzell, Dave" <Da...@surescripts.com>.
Thanks Ewan,

The second request was made by me directly.  I'm trying to add this functionality into my .Net application.  The library I'm using doesn't have an implementation of the AvroSeriazlizer that interacts with the schema registry.  I've now added in code to make to POST to /subjects/<topic>-value with the schema.   Something I noticed is that I was using schema like this:

{
  "subject": "AuditHdfsTest5-value",
  "version": 1,
  "id": 5,
  "schema": "{\"type\":\"record\",\"name\":\"GenericAuditRecord\",\"namespace\":\"audit\",\"fields\":[{\"name\":\"xml\",\"type\":[\"string\",\"null\"]}]}"
}

When the connector got a message and did a lookup it didn't have the "namespace" field and the lookup failed.  I then posted a new version of the schema without the "namespace" field and it worked.

-Dave

Dave Tauzell | Senior Software Engineer | Surescripts
O: 651.855.3042 | www.surescripts.com |   Dave.Tauzell@surescripts.com
Connect with us: Twitter I LinkedIn I Facebook I YouTube


-----Original Message-----
From: Ewen Cheslack-Postava [mailto:ewen@confluent.io]
Sent: Tuesday, June 14, 2016 6:59 PM
To: users@kafka.apache.org
Subject: Re: Kafka Connect HdfsSink and the Schema Registry

On Tue, Jun 14, 2016 at 8:08 AM, Tauzell, Dave <Dave.Tauzell@surescripts.com
> wrote:

> I have been able to get my C# client to put avro records to a Kafka
> topic and have the HdfsSink read and save them in files.  I am
> confused about interaction with the registry.  The kafka message
> contains a schema id an I see the connector look that up in the
> registry.  Then it also looks up a subject which is <topic>-value.
>
> What is the relationship between the passed schema id and the subject
> which is derived from the topic name?
>

The HDFS connector doesn't work directly with the schema registry, the AvroConverter does. I'm not sure what the second request you're seeing is
-- normally it would only look up the schema ID in order to get the schema.
Where are you seeing the second request, and can you include some logs? I can't think of any other requests the AvroConverter would be making just for deserialization.

The subject names are generating in the serializer as <topic>-key and <topic>-value and this is just the standardized approach Confluent's serializers use. The ID will have been registered under that subject.

-Ewen


>
> -Dave
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of
> the individual or entity to whom they are addressed. If you have
> received this e-mail in error, please notify the sender by reply
> e-mail immediately and destroy all copies of the e-mail and any attachments.
>



--
Thanks,
Ewen
This e-mail and any files transmitted with it are confidential, may contain sensitive information, and are intended solely for the use of the individual or entity to whom they are addressed. If you have received this e-mail in error, please notify the sender by reply e-mail immediately and destroy all copies of the e-mail and any attachments.

Re: Kafka Connect HdfsSink and the Schema Registry

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
On Tue, Jun 14, 2016 at 8:08 AM, Tauzell, Dave <Dave.Tauzell@surescripts.com
> wrote:

> I have been able to get my C# client to put avro records to a Kafka topic
> and have the HdfsSink read and save them in files.  I am confused about
> interaction with the registry.  The kafka message contains a schema id an I
> see the connector look that up in the registry.  Then it also looks up a
> subject which is <topic>-value.
>
> What is the relationship between the passed schema id and the subject
> which is derived from the topic name?
>

The HDFS connector doesn't work directly with the schema registry, the
AvroConverter does. I'm not sure what the second request you're seeing is
-- normally it would only look up the schema ID in order to get the schema.
Where are you seeing the second request, and can you include some logs? I
can't think of any other requests the AvroConverter would be making just
for deserialization.

The subject names are generating in the serializer as <topic>-key and
<topic>-value and this is just the standardized approach Confluent's
serializers use. The ID will have been registered under that subject.

-Ewen


>
> -Dave
>
> This e-mail and any files transmitted with it are confidential, may
> contain sensitive information, and are intended solely for the use of the
> individual or entity to whom they are addressed. If you have received this
> e-mail in error, please notify the sender by reply e-mail immediately and
> destroy all copies of the e-mail and any attachments.
>



-- 
Thanks,
Ewen