You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Shammon FY <zj...@gmail.com> on 2023/02/28 06:14:10 UTC

[DISCUSS] Deprecate deserialize method in DeserializationSchema

Hi devs

Currently there are two deserialization methods in `DeserializationSchema`
1. `T deserialize(byte[] message)`, only deserialize one record from
binary, if there is no record it should return null.
2. `void deserialize(byte[] message, Collector<T> out)`, supports
deserializing none, one or multiple records gracefully, it can completely
replace method `T deserialize(byte[] message)`.

The deserialization logic in the above two methods is basically coincident,
we recommend users use the second method to deserialize data. To improve
code maintainability, I'd like to mark the first function as `@Deprecated`,
and remove it when it is no longer used in the future.

I have created an issue[1] to track it, looking forward to your feedback,
thanks

[1] https://issues.apache.org/jira/browse/FLINK-31251


Best,
Shammon

Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema

Posted by Jing Ge <ji...@ververica.com.INVALID>.
Hi,

I have to agree with what Huang and Jingsong said. We should think more
about it from the user's(developers who use the API) perspective. The first
method T deserialize(byte[]) is convenient for users to deserialize a
single event. Many formats are using it, i.e. Avro, csv, etc. There should
also be Json cases for single event deserialization, if I am not mistaken.

void deserialize(byte[], Collector<T>) method is a default interface
method. There will be a big code refactoring if we use it to replace T
deserialize(byte[]). The benefit is very limited after considering all
concerns.

For few cases that do not need T deserialize(byte[]), the
maintenance effort is IMHO acceptable. It is, after all, only one method.

Best regards,
Jing

On Tue, Feb 28, 2023 at 9:47 AM Benchao Li <li...@apache.org> wrote:

> I share the same concerns with Jingsong and Hang, however, I'll raise a
> point why keeping both is also not a good idea.
>
> In FLINK-18590[1], we are introducing a feature that we'll deserialize JSON
> array into multiple records. This feature can only be used in `void
> deserialize(byte[] message, Collector<T> out)`. And many more cdc formats
> are doing the similar thing. If we keep both methods, many formats/features
> will not be available for `T deserialize(byte[] message)`.
>
> And for format maintenance, we usually need to keep these two methods,
> which is also a burden for format maintainers.
>
> [1] https://issues.apache.org/jira/browse/FLINK-18590
>
> Jingsong Li <ji...@gmail.com> 于2023年2月28日周二 16:03写道:
>
> > - `T deserialize(byte[] message)` is widely used and it is a public
> > api. It is very friendly for single record deserializers.
> > - `void deserialize(byte[] message, Collector<T> out)` supports
> > multiple records.
> >
> > I think we can just keep them as they are.
> >
> > Best,
> > Jingsong
> >
> >
> > On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan <ru...@gmail.com>
> wrote:
> > >
> > > Hi, Shammon,
> > >
> > > I think the method `void deserialize(byte[] message, Collector<T> out)`
> > > with a default implementation encapsulate how to deal with null for
> > > developers. If we remove the `T deserialize(byte[] message)`, the
> > > developers have to remember to handle null. Maybe we will get duplicate
> > > code among them.
> > > And I find there are only 5 implementations override the method `void
> > > deserialize(byte[] message, Collector<T> out)`. Other implementations
> > reuse
> > > the same code to handle null.
> > > I don't know the benefits of removing this method. Looking forward to
> > other
> > > people's opinions.
> > >
> > > Best,
> > > Hang
> > >
> > >
> > >
> > > Shammon FY <zj...@gmail.com> 于2023年2月28日周二 14:14写道:
> > >
> > > > Hi devs
> > > >
> > > > Currently there are two deserialization methods in
> > `DeserializationSchema`
> > > > 1. `T deserialize(byte[] message)`, only deserialize one record from
> > > > binary, if there is no record it should return null.
> > > > 2. `void deserialize(byte[] message, Collector<T> out)`, supports
> > > > deserializing none, one or multiple records gracefully, it can
> > completely
> > > > replace method `T deserialize(byte[] message)`.
> > > >
> > > > The deserialization logic in the above two methods is basically
> > coincident,
> > > > we recommend users use the second method to deserialize data. To
> > improve
> > > > code maintainability, I'd like to mark the first function as
> > `@Deprecated`,
> > > > and remove it when it is no longer used in the future.
> > > >
> > > > I have created an issue[1] to track it, looking forward to your
> > feedback,
> > > > thanks
> > > >
> > > > [1] https://issues.apache.org/jira/browse/FLINK-31251
> > > >
> > > >
> > > > Best,
> > > > Shammon
> > > >
> >
>
>
> --
>
> Best,
> Benchao Li
>

Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema

Posted by Benchao Li <li...@apache.org>.
I share the same concerns with Jingsong and Hang, however, I'll raise a
point why keeping both is also not a good idea.

In FLINK-18590[1], we are introducing a feature that we'll deserialize JSON
array into multiple records. This feature can only be used in `void
deserialize(byte[] message, Collector<T> out)`. And many more cdc formats
are doing the similar thing. If we keep both methods, many formats/features
will not be available for `T deserialize(byte[] message)`.

And for format maintenance, we usually need to keep these two methods,
which is also a burden for format maintainers.

[1] https://issues.apache.org/jira/browse/FLINK-18590

Jingsong Li <ji...@gmail.com> 于2023年2月28日周二 16:03写道:

> - `T deserialize(byte[] message)` is widely used and it is a public
> api. It is very friendly for single record deserializers.
> - `void deserialize(byte[] message, Collector<T> out)` supports
> multiple records.
>
> I think we can just keep them as they are.
>
> Best,
> Jingsong
>
>
> On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan <ru...@gmail.com> wrote:
> >
> > Hi, Shammon,
> >
> > I think the method `void deserialize(byte[] message, Collector<T> out)`
> > with a default implementation encapsulate how to deal with null for
> > developers. If we remove the `T deserialize(byte[] message)`, the
> > developers have to remember to handle null. Maybe we will get duplicate
> > code among them.
> > And I find there are only 5 implementations override the method `void
> > deserialize(byte[] message, Collector<T> out)`. Other implementations
> reuse
> > the same code to handle null.
> > I don't know the benefits of removing this method. Looking forward to
> other
> > people's opinions.
> >
> > Best,
> > Hang
> >
> >
> >
> > Shammon FY <zj...@gmail.com> 于2023年2月28日周二 14:14写道:
> >
> > > Hi devs
> > >
> > > Currently there are two deserialization methods in
> `DeserializationSchema`
> > > 1. `T deserialize(byte[] message)`, only deserialize one record from
> > > binary, if there is no record it should return null.
> > > 2. `void deserialize(byte[] message, Collector<T> out)`, supports
> > > deserializing none, one or multiple records gracefully, it can
> completely
> > > replace method `T deserialize(byte[] message)`.
> > >
> > > The deserialization logic in the above two methods is basically
> coincident,
> > > we recommend users use the second method to deserialize data. To
> improve
> > > code maintainability, I'd like to mark the first function as
> `@Deprecated`,
> > > and remove it when it is no longer used in the future.
> > >
> > > I have created an issue[1] to track it, looking forward to your
> feedback,
> > > thanks
> > >
> > > [1] https://issues.apache.org/jira/browse/FLINK-31251
> > >
> > >
> > > Best,
> > > Shammon
> > >
>


-- 

Best,
Benchao Li

Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema

Posted by Jingsong Li <ji...@gmail.com>.
- `T deserialize(byte[] message)` is widely used and it is a public
api. It is very friendly for single record deserializers.
- `void deserialize(byte[] message, Collector<T> out)` supports
multiple records.

I think we can just keep them as they are.

Best,
Jingsong


On Tue, Feb 28, 2023 at 3:08 PM Hang Ruan <ru...@gmail.com> wrote:
>
> Hi, Shammon,
>
> I think the method `void deserialize(byte[] message, Collector<T> out)`
> with a default implementation encapsulate how to deal with null for
> developers. If we remove the `T deserialize(byte[] message)`, the
> developers have to remember to handle null. Maybe we will get duplicate
> code among them.
> And I find there are only 5 implementations override the method `void
> deserialize(byte[] message, Collector<T> out)`. Other implementations reuse
> the same code to handle null.
> I don't know the benefits of removing this method. Looking forward to other
> people's opinions.
>
> Best,
> Hang
>
>
>
> Shammon FY <zj...@gmail.com> 于2023年2月28日周二 14:14写道:
>
> > Hi devs
> >
> > Currently there are two deserialization methods in `DeserializationSchema`
> > 1. `T deserialize(byte[] message)`, only deserialize one record from
> > binary, if there is no record it should return null.
> > 2. `void deserialize(byte[] message, Collector<T> out)`, supports
> > deserializing none, one or multiple records gracefully, it can completely
> > replace method `T deserialize(byte[] message)`.
> >
> > The deserialization logic in the above two methods is basically coincident,
> > we recommend users use the second method to deserialize data. To improve
> > code maintainability, I'd like to mark the first function as `@Deprecated`,
> > and remove it when it is no longer used in the future.
> >
> > I have created an issue[1] to track it, looking forward to your feedback,
> > thanks
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-31251
> >
> >
> > Best,
> > Shammon
> >

Re: [DISCUSS] Deprecate deserialize method in DeserializationSchema

Posted by Hang Ruan <ru...@gmail.com>.
Hi, Shammon,

I think the method `void deserialize(byte[] message, Collector<T> out)`
with a default implementation encapsulate how to deal with null for
developers. If we remove the `T deserialize(byte[] message)`, the
developers have to remember to handle null. Maybe we will get duplicate
code among them.
And I find there are only 5 implementations override the method `void
deserialize(byte[] message, Collector<T> out)`. Other implementations reuse
the same code to handle null.
I don't know the benefits of removing this method. Looking forward to other
people's opinions.

Best,
Hang



Shammon FY <zj...@gmail.com> 于2023年2月28日周二 14:14写道:

> Hi devs
>
> Currently there are two deserialization methods in `DeserializationSchema`
> 1. `T deserialize(byte[] message)`, only deserialize one record from
> binary, if there is no record it should return null.
> 2. `void deserialize(byte[] message, Collector<T> out)`, supports
> deserializing none, one or multiple records gracefully, it can completely
> replace method `T deserialize(byte[] message)`.
>
> The deserialization logic in the above two methods is basically coincident,
> we recommend users use the second method to deserialize data. To improve
> code maintainability, I'd like to mark the first function as `@Deprecated`,
> and remove it when it is no longer used in the future.
>
> I have created an issue[1] to track it, looking forward to your feedback,
> thanks
>
> [1] https://issues.apache.org/jira/browse/FLINK-31251
>
>
> Best,
> Shammon
>