You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Neng Lu <nl...@apache.org> on 2023/04/27 22:59:15 UTC

[DISCUSS] Improve Pulsar Function Source Primitive Schema Mapping

Hi All,

Based on [1], Pulsar has various primitive schema types and has a very
clear mapping between java classes to primitive schema types.

But in code [2], Pulsar Function Source only handles the byte and String
java classes primitive schema mapping while default all other primitive
types to JSON schema. Also for byte class types, the NONE schema is used
instead of the BYTES schema.

All these differences cause confusion for users trying to use Pulsar
Functions for the first time, and also make Pulsar Function not following
the Pulsar Schema official document.

Ideally, we should change the code [2], to make it following [1]. But such
changes may lead to breaking behaviors for existing users who adapted their
code to run the Pulsar Functions.

I would like to hear your thoughts on this and see how we should proceed.

Thank you! Regards

[1] https://pulsar.apache.org/docs/2.11.x/schema-understand/#primitive-type
[2]
https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L124

Re: [DISCUSS] Improve Pulsar Function Source Primitive Schema Mapping

Posted by Neng Lu <nl...@apache.org>.
Hi All,

Here's the PR for this proposed change:
https://github.com/apache/pulsar/pull/20294
If you have time, please take a look.

On Fri, May 5, 2023 at 6:08 AM Rui Fu <rf...@apache.org> wrote:

> Hi Neng,
>
> Thanks for bringing this issue up. Using JSON as the default schema and
> wrapping it with other primitive types are counterintuitive, and +1 to make
> [2] align with [1] so that both Pulsar Source and Pulsar Sink will make
> correct support with other primitive types.
>
> And as per the code [3], if the topic already exists, it will try to use
> the existing schema instead of the schema type returned by [2]. So the
> changes will only affect the newly deployed instances.
>
> [3]
> https://github.com/apache/pulsar/blob/branch-3.0/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L102-L122
>
> Best,
>
> Rui Fu
> On Apr 28, 2023 at 13:36 +0800, Pengcheng Jiang
> <pe...@streamnative.io.invalid>, wrote:
> > Hello Neng,
> >
> > IMO, we should update code[2] to follow the doc, and for existing
> > functions, if they are in running status, they won't touch code[2]; and
> for
> > a new run, functions
> > will fail to start, and this will remind users to update their function
> >
> > Regards,
> > Pengcheng Jiang
> >
> > Neng Lu <nl...@apache.org> 于2023年4月28日周五 06:59写道:
> >
> > > Hi All,
> > >
> > > Based on [1], Pulsar has various primitive schema types and has a very
> > > clear mapping between java classes to primitive schema types.
> > >
> > > But in code [2], Pulsar Function Source only handles the byte and
> String
> > > java classes primitive schema mapping while default all other primitive
> > > types to JSON schema. Also for byte class types, the NONE schema is
> used
> > > instead of the BYTES schema.
> > >
> > > All these differences cause confusion for users trying to use Pulsar
> > > Functions for the first time, and also make Pulsar Function not
> following
> > > the Pulsar Schema official document.
> > >
> > > Ideally, we should change the code [2], to make it following [1]. But
> such
> > > changes may lead to breaking behaviors for existing users who adapted
> their
> > > code to run the Pulsar Functions.
> > >
> > > I would like to hear your thoughts on this and see how we should
> proceed.
> > >
> > > Thank you! Regards
> > >
> > > [1]
> > >
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#primitive-type
> > > [2]
> > >
> > >
> https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L124
> > >
>

Re: [DISCUSS] Improve Pulsar Function Source Primitive Schema Mapping

Posted by Rui Fu <rf...@apache.org>.
Hi Neng,

Thanks for bringing this issue up. Using JSON as the default schema and wrapping it with other primitive types are counterintuitive, and +1 to make [2] align with [1] so that both Pulsar Source and Pulsar Sink will make correct support with other primitive types.

And as per the code [3], if the topic already exists, it will try to use the existing schema instead of the schema type returned by [2]. So the changes will only affect the newly deployed instances.

[3] https://github.com/apache/pulsar/blob/branch-3.0/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L102-L122

Best,

Rui Fu
On Apr 28, 2023 at 13:36 +0800, Pengcheng Jiang <pe...@streamnative.io.invalid>, wrote:
> Hello Neng,
>
> IMO, we should update code[2] to follow the doc, and for existing
> functions, if they are in running status, they won't touch code[2]; and for
> a new run, functions
> will fail to start, and this will remind users to update their function
>
> Regards,
> Pengcheng Jiang
>
> Neng Lu <nl...@apache.org> 于2023年4月28日周五 06:59写道:
>
> > Hi All,
> >
> > Based on [1], Pulsar has various primitive schema types and has a very
> > clear mapping between java classes to primitive schema types.
> >
> > But in code [2], Pulsar Function Source only handles the byte and String
> > java classes primitive schema mapping while default all other primitive
> > types to JSON schema. Also for byte class types, the NONE schema is used
> > instead of the BYTES schema.
> >
> > All these differences cause confusion for users trying to use Pulsar
> > Functions for the first time, and also make Pulsar Function not following
> > the Pulsar Schema official document.
> >
> > Ideally, we should change the code [2], to make it following [1]. But such
> > changes may lead to breaking behaviors for existing users who adapted their
> > code to run the Pulsar Functions.
> >
> > I would like to hear your thoughts on this and see how we should proceed.
> >
> > Thank you! Regards
> >
> > [1]
> > https://pulsar.apache.org/docs/2.11.x/schema-understand/#primitive-type
> > [2]
> >
> > https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L124
> >

Re: [DISCUSS] Improve Pulsar Function Source Primitive Schema Mapping

Posted by Pengcheng Jiang <pe...@streamnative.io.INVALID>.
Hello Neng,

IMO, we should update code[2] to follow the doc, and for existing
functions, if they are in running status, they won't touch code[2]; and for
a new run, functions
will fail to start, and this will remind users to update their function

Regards,
Pengcheng Jiang

Neng Lu <nl...@apache.org> 于2023年4月28日周五 06:59写道:

> Hi All,
>
> Based on [1], Pulsar has various primitive schema types and has a very
> clear mapping between java classes to primitive schema types.
>
> But in code [2], Pulsar Function Source only handles the byte and String
> java classes primitive schema mapping while default all other primitive
> types to JSON schema. Also for byte class types, the NONE schema is used
> instead of the BYTES schema.
>
> All these differences cause confusion for users trying to use Pulsar
> Functions for the first time, and also make Pulsar Function not following
> the Pulsar Schema official document.
>
> Ideally, we should change the code [2], to make it following [1]. But such
> changes may lead to breaking behaviors for existing users who adapted their
> code to run the Pulsar Functions.
>
> I would like to hear your thoughts on this and see how we should proceed.
>
> Thank you! Regards
>
> [1]
> https://pulsar.apache.org/docs/2.11.x/schema-understand/#primitive-type
> [2]
>
> https://github.com/apache/pulsar/blob/master/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/source/TopicSchema.java#L124
>