You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Adrian Iacob-Ghiula <ad...@gmail.com> on 2024/03/07 19:27:04 UTC

[DISCUSS] Pulsar Client Go - Avro Schema

Hi Everybody,

Pulsar Client Go uses https://github.com/linkedin/goavro for avro encoded
messages. pulsar.Schema is automatically created with no way to override
when a message has a schema version.
A Consumer / Reader has access to the raw bytes of the message but does not
have access to the schema-definition of the message.

A) Should the Consumer / Reader be allowed to override the Schema creation
after schema-definition is retrieved from Schema Registry ?
B) Should the Consumer / Reader have access to query Schema Registry for
retrieving the schema-definition of the message ?
C) Should just drop linkedin/goavro and migrate to
https://github.com/hamba/avro because: avro to golang struct generation ->
see avrogen; Easier handling of "nullable" union. by having a field as a
pointer; benchmark shows to be faster than linkedin.

I would go for the flexibility as it will not break consumers using
linkedin/goavro but advice would be nice :)

Thanks

Re: [DISCUSS] Pulsar Client Go - Avro Schema

Posted by Zike Yang <zi...@apache.org>.
Hi,

> A) Should the Consumer / Reader be allowed to override the Schema creation
after schema-definition is retrieved from Schema Registry ?

Are you suggesting we adjust the schema creator, allowing the
Consumer/Reader to use a different Avro library? I'm interested in
others' thoughts.

> B) Should the Consumer / Reader have access to query Schema Registry for
retrieving the schema-definition of the message ?

This has already been implemented. The Consumer/Reader attempts to
fetch the writer's schema from the registry to decode the message. If
it fails, it uses the consumer reader's schema for decoding.

> C) Should just drop linkedin/goavro and migrate to
https://github.com/hamba/avro because: avro to golang struct generation ->
see avrogen; Easier handling of "nullable" union. by having a field as a
pointer; benchmark shows to be faster than linkedin.

Thanks for bringing  the hamba/avro to the discussion.
Regarding the avro to golang struct generation, we can use
`gogen-avro` for the goavro.
Considering the 'nullable' union and benchmark results, hamba/avro
appears to be more user-friendly and faster.

Below, I've provided two examples to illustrate the differences in how
these two Avro libraries handle nullable unions.

type TestHambaAvro struct {
    Age *int `avro:"age"`
}

type TestGoavro struct {
    Age map[string]interface{} `avro:"age"`
}

Clearly, handling the 'age' field with hamba/avro is more intuitive.
In contrast, using goavro requires us to define a
map[string]interface{} type.

I'd like to hear from others. Would it be reasonable to provide an
option for go client users to switch between these two avro libraries?

Thanks,
Zike Yang

On Fri, Mar 8, 2024 at 4:33 AM Adrian Iacob-Ghiula
<ad...@gmail.com> wrote:
>
> Hi Everybody,
>
> Pulsar Client Go uses https://github.com/linkedin/goavro for avro encoded
> messages. pulsar.Schema is automatically created with no way to override
> when a message has a schema version.
> A Consumer / Reader has access to the raw bytes of the message but does not
> have access to the schema-definition of the message.
>
> A) Should the Consumer / Reader be allowed to override the Schema creation
> after schema-definition is retrieved from Schema Registry ?
> B) Should the Consumer / Reader have access to query Schema Registry for
> retrieving the schema-definition of the message ?
> C) Should just drop linkedin/goavro and migrate to
> https://github.com/hamba/avro because: avro to golang struct generation ->
> see avrogen; Easier handling of "nullable" union. by having a field as a
> pointer; benchmark shows to be faster than linkedin.
>
> I would go for the flexibility as it will not break consumers using
> linkedin/goavro but advice would be nice :)
>
> Thanks