You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Devin Bost <de...@gmail.com> on 2023/06/21 00:35:16 UTC

Has anyone EVER gotten a Python function to work with Avro??

After many of my own attempts, research, digging through source code, and
speaking with folks in various channels in the community, I'm starting to
wonder if *anyone* has *ever* successfully gotten Avro to work with Python
Functions.

(I don't just mean ingesting a byte array with fastavro but actually using
the built-in schema support that Pulsar Python functions are intended to
support - hence the purpose of combining built-in Avro internals with
multi-language support.) This capability is a core part of the community
offering to support Python, and as we've standardized on Avro internals,
I'm concerned we may have a gap in our ability to support this combination
of technologies, which can impact adoption in organizations that have a
heavy investment in both Python and Java (such as for different teams) when
Avro has already been standardized on.

I've brought this question up in various places/groups for almost 3 years
now, and I'm starting to wonder if *nobody* has actually done it.

I've seen examples of using Python producers and consumers with Avro, but
the interaction is different because those interfaces allow the Schema to
be explicitly specified. It's not clear from the source code how (or if)
this can be done currently with the Python Functions API.

If there's a feature gap here, then we need to decide if it's a priority to
address. This is becoming increasingly important as the Python userbase is
growing significantly, but I'd like to hear thoughts from others,
especially since Lari recently asked if we should be considering wider
changes to the Function API internals.

Devin G. Bost

Re: Has anyone EVER gotten a Python function to work with Avro??

Posted by Pengcheng Jiang <pe...@streamnative.io.INVALID>.
Hello Devin,

The support for the avro scheme in Python function is just added and
released in v3.0.0

There is an example of using avro in Python function:
https://github.com/apache/pulsar/blob/660525e57ed35b74cb9204521d1fba02cc08c542/pulsar-functions/python-examples/avro_schema_test_function.py

And we can submit the test function via the following:

```
bin/pulsar-admin functions create --name test-avro-py --tenant public
--namespace default \
--inputs persistent://public/default/test-input \
--output persistent://public/default/test-output \
--py avro_schema_test_function.py \
--className avro_schema_test_function.AvroSchemaTestFunction \
--schema-type avro \
--input-type-class-name avro_schema_test_function.AvroTestObject \
--output-type-class-name avro_schema_test_function.AvroTestObject
```

Sincerely
Pengcheng Jiang

Devin Bost <de...@gmail.com> 于2023年6月21日周三 08:35写道:

> After many of my own attempts, research, digging through source code, and
> speaking with folks in various channels in the community, I'm starting to
> wonder if *anyone* has *ever* successfully gotten Avro to work with Python
> Functions.
>
> (I don't just mean ingesting a byte array with fastavro but actually using
> the built-in schema support that Pulsar Python functions are intended to
> support - hence the purpose of combining built-in Avro internals with
> multi-language support.) This capability is a core part of the community
> offering to support Python, and as we've standardized on Avro internals,
> I'm concerned we may have a gap in our ability to support this combination
> of technologies, which can impact adoption in organizations that have a
> heavy investment in both Python and Java (such as for different teams) when
> Avro has already been standardized on.
>
> I've brought this question up in various places/groups for almost 3 years
> now, and I'm starting to wonder if *nobody* has actually done it.
>
> I've seen examples of using Python producers and consumers with Avro, but
> the interaction is different because those interfaces allow the Schema to
> be explicitly specified. It's not clear from the source code how (or if)
> this can be done currently with the Python Functions API.
>
> If there's a feature gap here, then we need to decide if it's a priority to
> address. This is becoming increasingly important as the Python userbase is
> growing significantly, but I'd like to hear thoughts from others,
> especially since Lari recently asked if we should be considering wider
> changes to the Function API internals.
>
> Devin G. Bost
>