You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Marios Trivyzas <ma...@gmail.com> on 2022/02/22 09:06:51 UTC

[DISCUSS] CAST legacy behaviour

Hello all!

I would like to bring back the discussion regarding the
*table.exec.legacy-cast-behaviour*
configuration option which we are introducing with Flink *1.15*. This
option provides the users
with the flexibility to continue using the old (incorrect, according to SQL
standards) behaviour
of *CAST.*

With Flink *1.15* we have introduced a bunch of fixes, improvements and new
casting functionality
between types, see https://issues.apache.org/jira/browse/FLINK-24403, and
some of them are
guarded behind the legacy behaviour option:

   - Trimming and padding when casting to CHAR/VARCHAR types to respect the
   specified length
   - Changes for casting various types to CHAR/VARCHAR/STRING
   - Runtime errors for CAST no longer emit *null *as result but exceptions
   are thrown with a meaningful message for the cause, that fail the
pipeline. *TRY_CAST
   *is introduced instead, which emits *null* results instead of throwing
   exceptions.

Those changes become active if users set the *table.exec.legacy-cast-behaviour
*option to *DISABLED*, otherwise
they will continue to experience the old, *erroneous*, behaviour of *CAST*.

Currently, we have set the *table.exec.legacy-cast-behaviour *option
to be *ENABLED
*by default, so if users want
to get the new correct behaviour, they are required to set explicitly the
option to *DISABLED*. Moreover, the option
itself is marked as deprecated, since the plan is to be removed in the
future, so that the old, erroneous behaviour
won't be an option, and the *TRY_CAST* would be the way to go if users
don't want to have errors and failed pipelines,
but have *null*s emitted in case of runtime errors when casting.

I would like to start a discussion and maybe ask for voting, so that we set
the *table.exec.legacy-cast-behaviour* option
to *DISABLED *by default, so that users that want to keep their old
pipelines working the same way, without changing their
SQL/TableAPI code, would need to explicitly set it to *ENABLED.*

My main argument for changing the default value for the option, is that the
*DISABLED* value is the one that enables
the *correct* behaviour for CAST which should be the default for all new
users. This way, new FLINK users, or users which
build new pipelines, from now on would get the correct behaviour by default
without the need of changing some flag in their
configuration. It feels weird to me, especially for people very familiar
with standard SQL, to be obliged to set some config
flag, to be able to get the correct behaviour for CAST. On top, users that
won't read about this option in our docs, will,
"blindly", experience the old incorrect behaviour for their new pipelines,
and issues that could cause the CAST to fail
will remain hidden from them, since *nulls* would be emitted. IMHO, this
last part is also very important during the development
stages of an application/pipeline. Normally, developers would want to see
all possible errors/scenarios during development
stages , in order to build a robust production system. If errors, are
hidden then, they can easily end up with those errors
in the production system which would be even harder to discover and debug,
since no exception will ever be thrown.
Imagine that there is a CAST which generates nulls because of runtime
errors, and its result is used in an aggregate function:

SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....

If nulls are emitted by the CAST (let's say because some records have a
quoted string value for col1, then simply the AVG result
would be wrong and in would be extremely difficult to realise and fix the
issue by simply wrapping col1 first with, for example,
a REGEXP_REPLACE, to get rid of the quotes.

My second argument is that, since we have marked
*table.exec.legacy-cast-behaviour* as deprecated, and we want to completely
remove it in the future, if the default value is *DISABLED*, when it's
removed we also make a breaking change, by changing the
default behaviour for all users, which is against the common software
practices. You introduce a deprecated flag to help users
using old versions of the software to smoothly transition to the new
version, while the new users experience the new features/behaviour,
without the need to set a flag. So, when in the future this flag is
completely removed, for those "new" users it would be completely
transparent. If we continue with having the default value *ENABLED*, those
new user would experience an "unnatural" breaking change
when this option is completely removed.

Best,
Marios

Re: [DISCUSS] CAST legacy behaviour

Posted by Francesco Guardiani <fr...@ververica.com>.
Hi all,
As I see this thread has consensus, here is the issue and PR to disable the
legacy behavior by default:
https://issues.apache.org/jira/browse/FLINK-26551
https://github.com/apache/flink/pull/19020

We target to merge it before the release of 1.15, unless there are any
objections.

FG

On Tue, Mar 1, 2022 at 2:18 PM Marios Trivyzas <ma...@gmail.com> wrote:

> Indeed, if we manage to use the configuration from *flink-conf.yaml* down
> the stack,
> it would be easy for everyone to configure a "system-wide" legacy cast
> behaviour.
>
> Best regards,
> Marios
>
> On Tue, Mar 1, 2022 at 2:52 PM Timo Walther <tw...@apache.org> wrote:
>
> > +1
> >
> > Thanks for bringing up this discussion one more time Marios.
> >
> > I strongly support enabling the new behavior in 1.15. It definitely has
> > implications on existing users, but as Seth said, thinking about the
> > upcoming upgrade story we need to make sure that at least the core/basic
> > operations are correct. Otherwise we will have to maintain multiple
> > versions of functions with broken semantics.
> >
> > I since we also try to fix various issues around configuration, maybe it
> > might still be possible to configure the legacy cast behavior globally
> > via flink-conf.yaml. This should make the transitioning period easier in
> > production.
> >
> > Regards,
> > Timo
> >
> > Am 28.02.22 um 19:04 schrieb Seth Wiesman:
> > > +1
> > >
> > > Especially as SQL upgrades are right around the corner, it makes sense
> to
> > > get our defaults right.
> > >
> > > Seth
> > >
> > > On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <ma...@ververica.com>
> > > wrote:
> > >
> > >> +1 for setting this option to disabled by default. I believe failures
> > >> should be brought forward as soon as possible, so they can be fixed as
> > fast
> > >> as possible. It will also be less confusing for new users. Last but
> not
> > >> least, I believe the impact on existing users will be minimal (since
> it
> > can
> > >> be changed by changing one flag).
> > >>
> > >> Best regards,
> > >>
> > >> Martijn
> > >>
> > >> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <ma...@gmail.com>
> wrote:
> > >>
> > >>> Thanks Francesco,
> > >>>
> > >>> The two arguments you posted, further strengthen the need to make it
> > >>> DISABLED by default.
> > >>>
> > >>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> > >>> francesco@ververica.com> wrote:
> > >>>
> > >>>> Hi all,
> > >>>> I'm +1 with what everything you said Marios.
> > >>>>
> > >>>> I'm gonna add another argument on top of that: the
> > >> "legacy-cast-behavior"
> > >>>> has also a broken type inference, leading to incorrect results or
> > >> further
> > >>>> errors down in the pipeline[1]. For example, take this:
> > >>>>
> > >>>> SELECT COALESCE(CAST('a' AS INT), 0) ...
> > >>>>
> > >>>> With the legacy cast behavior ENABLED, this is going to lead to the
> > >> wrong
> > >>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return
> value
> > >> is
> > >>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and
> will
> > >>> never
> > >>>> return 0. Essentially, CAST will infer the wrong nullability leading
> > to
> > >>>> assigning its result to a NOT NULL type, when its value can
> > effectively
> > >>> be
> > >>>> NULL.
> > >>>>
> > >>>>> You introduce a deprecated flag to help users
> > >>>> using old versions of the software to smoothly transition to the new
> > >>>> version, while the new users experience the new features/behavior,
> > >>>> without the need to set a flag.
> > >>>>
> > >>>> This is IMO the major point on why we should disable the legacy cast
> > >>>> behavior by default. This is even more relevant with 1.15 and the
> > >> upgrade
> > >>>> story, as per the problem described above, because users will now
> end
> > >> up
> > >>>> generating plans with wrong type inference, which will be hard to
> > >> migrate
> > >>>> in the next flink versions.
> > >>>>
> > >>>> FG
> > >>>>
> > >>>> [1] In case you're wondering why it wasn't fixed, the reason is that
> > >>> fixing
> > >>>> it means creating a breaking change, for details
> > >>>> https://github.com/apache/flink/pull/18611#issuecomment-1028174877
> > >>>>
> > >>>>
> > >>>> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com>
> > >>> wrote:
> > >>>>> Hello all!
> > >>>>>
> > >>>>> I would like to bring back the discussion regarding the
> > >>>>> *table.exec.legacy-cast-behaviour*
> > >>>>> configuration option which we are introducing with Flink *1.15*.
> This
> > >>>>> option provides the users
> > >>>>> with the flexibility to continue using the old (incorrect,
> according
> > >> to
> > >>>> SQL
> > >>>>> standards) behaviour
> > >>>>> of *CAST.*
> > >>>>>
> > >>>>> With Flink *1.15* we have introduced a bunch of fixes, improvements
> > >> and
> > >>>> new
> > >>>>> casting functionality
> > >>>>> between types, see
> https://issues.apache.org/jira/browse/FLINK-24403
> > >> ,
> > >>>> and
> > >>>>> some of them are
> > >>>>> guarded behind the legacy behaviour option:
> > >>>>>
> > >>>>>     - Trimming and padding when casting to CHAR/VARCHAR types to
> > >> respect
> > >>>> the
> > >>>>>     specified length
> > >>>>>     - Changes for casting various types to CHAR/VARCHAR/STRING
> > >>>>>     - Runtime errors for CAST no longer emit *null *as result but
> > >>>> exceptions
> > >>>>>     are thrown with a meaningful message for the cause, that fail
> the
> > >>>>> pipeline. *TRY_CAST
> > >>>>>     *is introduced instead, which emits *null* results instead of
> > >>> throwing
> > >>>>>     exceptions.
> > >>>>>
> > >>>>> Those changes become active if users set the
> > >>>>> *table.exec.legacy-cast-behaviour
> > >>>>> *option to *DISABLED*, otherwise
> > >>>>> they will continue to experience the old, *erroneous*, behaviour of
> > >>>> *CAST*.
> > >>>>> Currently, we have set the *table.exec.legacy-cast-behaviour
> *option
> > >>>>> to be *ENABLED
> > >>>>> *by default, so if users want
> > >>>>> to get the new correct behaviour, they are required to set
> explicitly
> > >>> the
> > >>>>> option to *DISABLED*. Moreover, the option
> > >>>>> itself is marked as deprecated, since the plan is to be removed in
> > >> the
> > >>>>> future, so that the old, erroneous behaviour
> > >>>>> won't be an option, and the *TRY_CAST* would be the way to go if
> > >> users
> > >>>>> don't want to have errors and failed pipelines,
> > >>>>> but have *null*s emitted in case of runtime errors when casting.
> > >>>>>
> > >>>>> I would like to start a discussion and maybe ask for voting, so
> that
> > >> we
> > >>>> set
> > >>>>> the *table.exec.legacy-cast-behaviour* option
> > >>>>> to *DISABLED *by default, so that users that want to keep their old
> > >>>>> pipelines working the same way, without changing their
> > >>>>> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> > >>>>>
> > >>>>> My main argument for changing the default value for the option, is
> > >> that
> > >>>> the
> > >>>>> *DISABLED* value is the one that enables
> > >>>>> the *correct* behaviour for CAST which should be the default for
> all
> > >>> new
> > >>>>> users. This way, new FLINK users, or users which
> > >>>>> build new pipelines, from now on would get the correct behaviour by
> > >>>> default
> > >>>>> without the need of changing some flag in their
> > >>>>> configuration. It feels weird to me, especially for people very
> > >>> familiar
> > >>>>> with standard SQL, to be obliged to set some config
> > >>>>> flag, to be able to get the correct behaviour for CAST. On top,
> users
> > >>>> that
> > >>>>> won't read about this option in our docs, will,
> > >>>>> "blindly", experience the old incorrect behaviour for their new
> > >>>> pipelines,
> > >>>>> and issues that could cause the CAST to fail
> > >>>>> will remain hidden from them, since *nulls* would be emitted. IMHO,
> > >>> this
> > >>>>> last part is also very important during the development
> > >>>>> stages of an application/pipeline. Normally, developers would want
> to
> > >>> see
> > >>>>> all possible errors/scenarios during development
> > >>>>> stages , in order to build a robust production system. If errors,
> are
> > >>>>> hidden then, they can easily end up with those errors
> > >>>>> in the production system which would be even harder to discover and
> > >>>> debug,
> > >>>>> since no exception will ever be thrown.
> > >>>>> Imagine that there is a CAST which generates nulls because of
> runtime
> > >>>>> errors, and its result is used in an aggregate function:
> > >>>>>
> > >>>>> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> > >>>>>
> > >>>>> If nulls are emitted by the CAST (let's say because some records
> > >> have a
> > >>>>> quoted string value for col1, then simply the AVG result
> > >>>>> would be wrong and in would be extremely difficult to realise and
> fix
> > >>> the
> > >>>>> issue by simply wrapping col1 first with, for example,
> > >>>>> a REGEXP_REPLACE, to get rid of the quotes.
> > >>>>>
> > >>>>> My second argument is that, since we have marked
> > >>>>> *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> > >>>> completely
> > >>>>> remove it in the future, if the default value is *DISABLED*, when
> > >> it's
> > >>>>> removed we also make a breaking change, by changing the
> > >>>>> default behaviour for all users, which is against the common
> software
> > >>>>> practices. You introduce a deprecated flag to help users
> > >>>>> using old versions of the software to smoothly transition to the
> new
> > >>>>> version, while the new users experience the new features/behaviour,
> > >>>>> without the need to set a flag. So, when in the future this flag is
> > >>>>> completely removed, for those "new" users it would be completely
> > >>>>> transparent. If we continue with having the default value
> *ENABLED*,
> > >>>> those
> > >>>>> new user would experience an "unnatural" breaking change
> > >>>>> when this option is completely removed.
> > >>>>>
> > >>>>> Best,
> > >>>>> Marios
> > >>>>>
> > >>>
> > >>> --
> > >>> Marios
> > >>>
> >
> >
>
> --
> Marios
>

Re: [DISCUSS] CAST legacy behaviour

Posted by Marios Trivyzas <ma...@gmail.com>.
Indeed, if we manage to use the configuration from *flink-conf.yaml* down
the stack,
it would be easy for everyone to configure a "system-wide" legacy cast
behaviour.

Best regards,
Marios

On Tue, Mar 1, 2022 at 2:52 PM Timo Walther <tw...@apache.org> wrote:

> +1
>
> Thanks for bringing up this discussion one more time Marios.
>
> I strongly support enabling the new behavior in 1.15. It definitely has
> implications on existing users, but as Seth said, thinking about the
> upcoming upgrade story we need to make sure that at least the core/basic
> operations are correct. Otherwise we will have to maintain multiple
> versions of functions with broken semantics.
>
> I since we also try to fix various issues around configuration, maybe it
> might still be possible to configure the legacy cast behavior globally
> via flink-conf.yaml. This should make the transitioning period easier in
> production.
>
> Regards,
> Timo
>
> Am 28.02.22 um 19:04 schrieb Seth Wiesman:
> > +1
> >
> > Especially as SQL upgrades are right around the corner, it makes sense to
> > get our defaults right.
> >
> > Seth
> >
> > On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <ma...@ververica.com>
> > wrote:
> >
> >> +1 for setting this option to disabled by default. I believe failures
> >> should be brought forward as soon as possible, so they can be fixed as
> fast
> >> as possible. It will also be less confusing for new users. Last but not
> >> least, I believe the impact on existing users will be minimal (since it
> can
> >> be changed by changing one flag).
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <ma...@gmail.com> wrote:
> >>
> >>> Thanks Francesco,
> >>>
> >>> The two arguments you posted, further strengthen the need to make it
> >>> DISABLED by default.
> >>>
> >>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> >>> francesco@ververica.com> wrote:
> >>>
> >>>> Hi all,
> >>>> I'm +1 with what everything you said Marios.
> >>>>
> >>>> I'm gonna add another argument on top of that: the
> >> "legacy-cast-behavior"
> >>>> has also a broken type inference, leading to incorrect results or
> >> further
> >>>> errors down in the pipeline[1]. For example, take this:
> >>>>
> >>>> SELECT COALESCE(CAST('a' AS INT), 0) ...
> >>>>
> >>>> With the legacy cast behavior ENABLED, this is going to lead to the
> >> wrong
> >>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value
> >> is
> >>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and will
> >>> never
> >>>> return 0. Essentially, CAST will infer the wrong nullability leading
> to
> >>>> assigning its result to a NOT NULL type, when its value can
> effectively
> >>> be
> >>>> NULL.
> >>>>
> >>>>> You introduce a deprecated flag to help users
> >>>> using old versions of the software to smoothly transition to the new
> >>>> version, while the new users experience the new features/behavior,
> >>>> without the need to set a flag.
> >>>>
> >>>> This is IMO the major point on why we should disable the legacy cast
> >>>> behavior by default. This is even more relevant with 1.15 and the
> >> upgrade
> >>>> story, as per the problem described above, because users will now end
> >> up
> >>>> generating plans with wrong type inference, which will be hard to
> >> migrate
> >>>> in the next flink versions.
> >>>>
> >>>> FG
> >>>>
> >>>> [1] In case you're wondering why it wasn't fixed, the reason is that
> >>> fixing
> >>>> it means creating a breaking change, for details
> >>>> https://github.com/apache/flink/pull/18611#issuecomment-1028174877
> >>>>
> >>>>
> >>>> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com>
> >>> wrote:
> >>>>> Hello all!
> >>>>>
> >>>>> I would like to bring back the discussion regarding the
> >>>>> *table.exec.legacy-cast-behaviour*
> >>>>> configuration option which we are introducing with Flink *1.15*. This
> >>>>> option provides the users
> >>>>> with the flexibility to continue using the old (incorrect, according
> >> to
> >>>> SQL
> >>>>> standards) behaviour
> >>>>> of *CAST.*
> >>>>>
> >>>>> With Flink *1.15* we have introduced a bunch of fixes, improvements
> >> and
> >>>> new
> >>>>> casting functionality
> >>>>> between types, see https://issues.apache.org/jira/browse/FLINK-24403
> >> ,
> >>>> and
> >>>>> some of them are
> >>>>> guarded behind the legacy behaviour option:
> >>>>>
> >>>>>     - Trimming and padding when casting to CHAR/VARCHAR types to
> >> respect
> >>>> the
> >>>>>     specified length
> >>>>>     - Changes for casting various types to CHAR/VARCHAR/STRING
> >>>>>     - Runtime errors for CAST no longer emit *null *as result but
> >>>> exceptions
> >>>>>     are thrown with a meaningful message for the cause, that fail the
> >>>>> pipeline. *TRY_CAST
> >>>>>     *is introduced instead, which emits *null* results instead of
> >>> throwing
> >>>>>     exceptions.
> >>>>>
> >>>>> Those changes become active if users set the
> >>>>> *table.exec.legacy-cast-behaviour
> >>>>> *option to *DISABLED*, otherwise
> >>>>> they will continue to experience the old, *erroneous*, behaviour of
> >>>> *CAST*.
> >>>>> Currently, we have set the *table.exec.legacy-cast-behaviour *option
> >>>>> to be *ENABLED
> >>>>> *by default, so if users want
> >>>>> to get the new correct behaviour, they are required to set explicitly
> >>> the
> >>>>> option to *DISABLED*. Moreover, the option
> >>>>> itself is marked as deprecated, since the plan is to be removed in
> >> the
> >>>>> future, so that the old, erroneous behaviour
> >>>>> won't be an option, and the *TRY_CAST* would be the way to go if
> >> users
> >>>>> don't want to have errors and failed pipelines,
> >>>>> but have *null*s emitted in case of runtime errors when casting.
> >>>>>
> >>>>> I would like to start a discussion and maybe ask for voting, so that
> >> we
> >>>> set
> >>>>> the *table.exec.legacy-cast-behaviour* option
> >>>>> to *DISABLED *by default, so that users that want to keep their old
> >>>>> pipelines working the same way, without changing their
> >>>>> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> >>>>>
> >>>>> My main argument for changing the default value for the option, is
> >> that
> >>>> the
> >>>>> *DISABLED* value is the one that enables
> >>>>> the *correct* behaviour for CAST which should be the default for all
> >>> new
> >>>>> users. This way, new FLINK users, or users which
> >>>>> build new pipelines, from now on would get the correct behaviour by
> >>>> default
> >>>>> without the need of changing some flag in their
> >>>>> configuration. It feels weird to me, especially for people very
> >>> familiar
> >>>>> with standard SQL, to be obliged to set some config
> >>>>> flag, to be able to get the correct behaviour for CAST. On top, users
> >>>> that
> >>>>> won't read about this option in our docs, will,
> >>>>> "blindly", experience the old incorrect behaviour for their new
> >>>> pipelines,
> >>>>> and issues that could cause the CAST to fail
> >>>>> will remain hidden from them, since *nulls* would be emitted. IMHO,
> >>> this
> >>>>> last part is also very important during the development
> >>>>> stages of an application/pipeline. Normally, developers would want to
> >>> see
> >>>>> all possible errors/scenarios during development
> >>>>> stages , in order to build a robust production system. If errors, are
> >>>>> hidden then, they can easily end up with those errors
> >>>>> in the production system which would be even harder to discover and
> >>>> debug,
> >>>>> since no exception will ever be thrown.
> >>>>> Imagine that there is a CAST which generates nulls because of runtime
> >>>>> errors, and its result is used in an aggregate function:
> >>>>>
> >>>>> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> >>>>>
> >>>>> If nulls are emitted by the CAST (let's say because some records
> >> have a
> >>>>> quoted string value for col1, then simply the AVG result
> >>>>> would be wrong and in would be extremely difficult to realise and fix
> >>> the
> >>>>> issue by simply wrapping col1 first with, for example,
> >>>>> a REGEXP_REPLACE, to get rid of the quotes.
> >>>>>
> >>>>> My second argument is that, since we have marked
> >>>>> *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> >>>> completely
> >>>>> remove it in the future, if the default value is *DISABLED*, when
> >> it's
> >>>>> removed we also make a breaking change, by changing the
> >>>>> default behaviour for all users, which is against the common software
> >>>>> practices. You introduce a deprecated flag to help users
> >>>>> using old versions of the software to smoothly transition to the new
> >>>>> version, while the new users experience the new features/behaviour,
> >>>>> without the need to set a flag. So, when in the future this flag is
> >>>>> completely removed, for those "new" users it would be completely
> >>>>> transparent. If we continue with having the default value *ENABLED*,
> >>>> those
> >>>>> new user would experience an "unnatural" breaking change
> >>>>> when this option is completely removed.
> >>>>>
> >>>>> Best,
> >>>>> Marios
> >>>>>
> >>>
> >>> --
> >>> Marios
> >>>
>
>

-- 
Marios

Re: [DISCUSS] CAST legacy behaviour

Posted by Timo Walther <tw...@apache.org>.
+1

Thanks for bringing up this discussion one more time Marios.

I strongly support enabling the new behavior in 1.15. It definitely has 
implications on existing users, but as Seth said, thinking about the 
upcoming upgrade story we need to make sure that at least the core/basic 
operations are correct. Otherwise we will have to maintain multiple 
versions of functions with broken semantics.

I since we also try to fix various issues around configuration, maybe it 
might still be possible to configure the legacy cast behavior globally 
via flink-conf.yaml. This should make the transitioning period easier in 
production.

Regards,
Timo

Am 28.02.22 um 19:04 schrieb Seth Wiesman:
> +1
>
> Especially as SQL upgrades are right around the corner, it makes sense to
> get our defaults right.
>
> Seth
>
> On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <ma...@ververica.com>
> wrote:
>
>> +1 for setting this option to disabled by default. I believe failures
>> should be brought forward as soon as possible, so they can be fixed as fast
>> as possible. It will also be less confusing for new users. Last but not
>> least, I believe the impact on existing users will be minimal (since it can
>> be changed by changing one flag).
>>
>> Best regards,
>>
>> Martijn
>>
>> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <ma...@gmail.com> wrote:
>>
>>> Thanks Francesco,
>>>
>>> The two arguments you posted, further strengthen the need to make it
>>> DISABLED by default.
>>>
>>> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
>>> francesco@ververica.com> wrote:
>>>
>>>> Hi all,
>>>> I'm +1 with what everything you said Marios.
>>>>
>>>> I'm gonna add another argument on top of that: the
>> "legacy-cast-behavior"
>>>> has also a broken type inference, leading to incorrect results or
>> further
>>>> errors down in the pipeline[1]. For example, take this:
>>>>
>>>> SELECT COALESCE(CAST('a' AS INT), 0) ...
>>>>
>>>> With the legacy cast behavior ENABLED, this is going to lead to the
>> wrong
>>>> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value
>> is
>>>> inferred as INT NOT NULL, so the planner will drop COALESCE, and will
>>> never
>>>> return 0. Essentially, CAST will infer the wrong nullability leading to
>>>> assigning its result to a NOT NULL type, when its value can effectively
>>> be
>>>> NULL.
>>>>
>>>>> You introduce a deprecated flag to help users
>>>> using old versions of the software to smoothly transition to the new
>>>> version, while the new users experience the new features/behavior,
>>>> without the need to set a flag.
>>>>
>>>> This is IMO the major point on why we should disable the legacy cast
>>>> behavior by default. This is even more relevant with 1.15 and the
>> upgrade
>>>> story, as per the problem described above, because users will now end
>> up
>>>> generating plans with wrong type inference, which will be hard to
>> migrate
>>>> in the next flink versions.
>>>>
>>>> FG
>>>>
>>>> [1] In case you're wondering why it wasn't fixed, the reason is that
>>> fixing
>>>> it means creating a breaking change, for details
>>>> https://github.com/apache/flink/pull/18611#issuecomment-1028174877
>>>>
>>>>
>>>> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com>
>>> wrote:
>>>>> Hello all!
>>>>>
>>>>> I would like to bring back the discussion regarding the
>>>>> *table.exec.legacy-cast-behaviour*
>>>>> configuration option which we are introducing with Flink *1.15*. This
>>>>> option provides the users
>>>>> with the flexibility to continue using the old (incorrect, according
>> to
>>>> SQL
>>>>> standards) behaviour
>>>>> of *CAST.*
>>>>>
>>>>> With Flink *1.15* we have introduced a bunch of fixes, improvements
>> and
>>>> new
>>>>> casting functionality
>>>>> between types, see https://issues.apache.org/jira/browse/FLINK-24403
>> ,
>>>> and
>>>>> some of them are
>>>>> guarded behind the legacy behaviour option:
>>>>>
>>>>>     - Trimming and padding when casting to CHAR/VARCHAR types to
>> respect
>>>> the
>>>>>     specified length
>>>>>     - Changes for casting various types to CHAR/VARCHAR/STRING
>>>>>     - Runtime errors for CAST no longer emit *null *as result but
>>>> exceptions
>>>>>     are thrown with a meaningful message for the cause, that fail the
>>>>> pipeline. *TRY_CAST
>>>>>     *is introduced instead, which emits *null* results instead of
>>> throwing
>>>>>     exceptions.
>>>>>
>>>>> Those changes become active if users set the
>>>>> *table.exec.legacy-cast-behaviour
>>>>> *option to *DISABLED*, otherwise
>>>>> they will continue to experience the old, *erroneous*, behaviour of
>>>> *CAST*.
>>>>> Currently, we have set the *table.exec.legacy-cast-behaviour *option
>>>>> to be *ENABLED
>>>>> *by default, so if users want
>>>>> to get the new correct behaviour, they are required to set explicitly
>>> the
>>>>> option to *DISABLED*. Moreover, the option
>>>>> itself is marked as deprecated, since the plan is to be removed in
>> the
>>>>> future, so that the old, erroneous behaviour
>>>>> won't be an option, and the *TRY_CAST* would be the way to go if
>> users
>>>>> don't want to have errors and failed pipelines,
>>>>> but have *null*s emitted in case of runtime errors when casting.
>>>>>
>>>>> I would like to start a discussion and maybe ask for voting, so that
>> we
>>>> set
>>>>> the *table.exec.legacy-cast-behaviour* option
>>>>> to *DISABLED *by default, so that users that want to keep their old
>>>>> pipelines working the same way, without changing their
>>>>> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
>>>>>
>>>>> My main argument for changing the default value for the option, is
>> that
>>>> the
>>>>> *DISABLED* value is the one that enables
>>>>> the *correct* behaviour for CAST which should be the default for all
>>> new
>>>>> users. This way, new FLINK users, or users which
>>>>> build new pipelines, from now on would get the correct behaviour by
>>>> default
>>>>> without the need of changing some flag in their
>>>>> configuration. It feels weird to me, especially for people very
>>> familiar
>>>>> with standard SQL, to be obliged to set some config
>>>>> flag, to be able to get the correct behaviour for CAST. On top, users
>>>> that
>>>>> won't read about this option in our docs, will,
>>>>> "blindly", experience the old incorrect behaviour for their new
>>>> pipelines,
>>>>> and issues that could cause the CAST to fail
>>>>> will remain hidden from them, since *nulls* would be emitted. IMHO,
>>> this
>>>>> last part is also very important during the development
>>>>> stages of an application/pipeline. Normally, developers would want to
>>> see
>>>>> all possible errors/scenarios during development
>>>>> stages , in order to build a robust production system. If errors, are
>>>>> hidden then, they can easily end up with those errors
>>>>> in the production system which would be even harder to discover and
>>>> debug,
>>>>> since no exception will ever be thrown.
>>>>> Imagine that there is a CAST which generates nulls because of runtime
>>>>> errors, and its result is used in an aggregate function:
>>>>>
>>>>> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
>>>>>
>>>>> If nulls are emitted by the CAST (let's say because some records
>> have a
>>>>> quoted string value for col1, then simply the AVG result
>>>>> would be wrong and in would be extremely difficult to realise and fix
>>> the
>>>>> issue by simply wrapping col1 first with, for example,
>>>>> a REGEXP_REPLACE, to get rid of the quotes.
>>>>>
>>>>> My second argument is that, since we have marked
>>>>> *table.exec.legacy-cast-behaviour* as deprecated, and we want to
>>>> completely
>>>>> remove it in the future, if the default value is *DISABLED*, when
>> it's
>>>>> removed we also make a breaking change, by changing the
>>>>> default behaviour for all users, which is against the common software
>>>>> practices. You introduce a deprecated flag to help users
>>>>> using old versions of the software to smoothly transition to the new
>>>>> version, while the new users experience the new features/behaviour,
>>>>> without the need to set a flag. So, when in the future this flag is
>>>>> completely removed, for those "new" users it would be completely
>>>>> transparent. If we continue with having the default value *ENABLED*,
>>>> those
>>>>> new user would experience an "unnatural" breaking change
>>>>> when this option is completely removed.
>>>>>
>>>>> Best,
>>>>> Marios
>>>>>
>>>
>>> --
>>> Marios
>>>


Re: [DISCUSS] CAST legacy behaviour

Posted by Seth Wiesman <sj...@gmail.com>.
+1

Especially as SQL upgrades are right around the corner, it makes sense to
get our defaults right.

Seth

On Mon, Feb 28, 2022 at 7:14 AM Martijn Visser <ma...@ververica.com>
wrote:

> +1 for setting this option to disabled by default. I believe failures
> should be brought forward as soon as possible, so they can be fixed as fast
> as possible. It will also be less confusing for new users. Last but not
> least, I believe the impact on existing users will be minimal (since it can
> be changed by changing one flag).
>
> Best regards,
>
> Martijn
>
> On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <ma...@gmail.com> wrote:
>
> > Thanks Francesco,
> >
> > The two arguments you posted, further strengthen the need to make it
> > DISABLED by default.
> >
> > On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> > francesco@ververica.com> wrote:
> >
> > > Hi all,
> > > I'm +1 with what everything you said Marios.
> > >
> > > I'm gonna add another argument on top of that: the
> "legacy-cast-behavior"
> > > has also a broken type inference, leading to incorrect results or
> further
> > > errors down in the pipeline[1]. For example, take this:
> > >
> > > SELECT COALESCE(CAST('a' AS INT), 0) ...
> > >
> > > With the legacy cast behavior ENABLED, this is going to lead to the
> wrong
> > > result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value
> is
> > > inferred as INT NOT NULL, so the planner will drop COALESCE, and will
> > never
> > > return 0. Essentially, CAST will infer the wrong nullability leading to
> > > assigning its result to a NOT NULL type, when its value can effectively
> > be
> > > NULL.
> > >
> > > > You introduce a deprecated flag to help users
> > > using old versions of the software to smoothly transition to the new
> > > version, while the new users experience the new features/behavior,
> > > without the need to set a flag.
> > >
> > > This is IMO the major point on why we should disable the legacy cast
> > > behavior by default. This is even more relevant with 1.15 and the
> upgrade
> > > story, as per the problem described above, because users will now end
> up
> > > generating plans with wrong type inference, which will be hard to
> migrate
> > > in the next flink versions.
> > >
> > > FG
> > >
> > > [1] In case you're wondering why it wasn't fixed, the reason is that
> > fixing
> > > it means creating a breaking change, for details
> > > https://github.com/apache/flink/pull/18611#issuecomment-1028174877
> > >
> > >
> > > On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com>
> > wrote:
> > >
> > > > Hello all!
> > > >
> > > > I would like to bring back the discussion regarding the
> > > > *table.exec.legacy-cast-behaviour*
> > > > configuration option which we are introducing with Flink *1.15*. This
> > > > option provides the users
> > > > with the flexibility to continue using the old (incorrect, according
> to
> > > SQL
> > > > standards) behaviour
> > > > of *CAST.*
> > > >
> > > > With Flink *1.15* we have introduced a bunch of fixes, improvements
> and
> > > new
> > > > casting functionality
> > > > between types, see https://issues.apache.org/jira/browse/FLINK-24403
> ,
> > > and
> > > > some of them are
> > > > guarded behind the legacy behaviour option:
> > > >
> > > >    - Trimming and padding when casting to CHAR/VARCHAR types to
> respect
> > > the
> > > >    specified length
> > > >    - Changes for casting various types to CHAR/VARCHAR/STRING
> > > >    - Runtime errors for CAST no longer emit *null *as result but
> > > exceptions
> > > >    are thrown with a meaningful message for the cause, that fail the
> > > > pipeline. *TRY_CAST
> > > >    *is introduced instead, which emits *null* results instead of
> > throwing
> > > >    exceptions.
> > > >
> > > > Those changes become active if users set the
> > > > *table.exec.legacy-cast-behaviour
> > > > *option to *DISABLED*, otherwise
> > > > they will continue to experience the old, *erroneous*, behaviour of
> > > *CAST*.
> > > >
> > > > Currently, we have set the *table.exec.legacy-cast-behaviour *option
> > > > to be *ENABLED
> > > > *by default, so if users want
> > > > to get the new correct behaviour, they are required to set explicitly
> > the
> > > > option to *DISABLED*. Moreover, the option
> > > > itself is marked as deprecated, since the plan is to be removed in
> the
> > > > future, so that the old, erroneous behaviour
> > > > won't be an option, and the *TRY_CAST* would be the way to go if
> users
> > > > don't want to have errors and failed pipelines,
> > > > but have *null*s emitted in case of runtime errors when casting.
> > > >
> > > > I would like to start a discussion and maybe ask for voting, so that
> we
> > > set
> > > > the *table.exec.legacy-cast-behaviour* option
> > > > to *DISABLED *by default, so that users that want to keep their old
> > > > pipelines working the same way, without changing their
> > > > SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> > > >
> > > > My main argument for changing the default value for the option, is
> that
> > > the
> > > > *DISABLED* value is the one that enables
> > > > the *correct* behaviour for CAST which should be the default for all
> > new
> > > > users. This way, new FLINK users, or users which
> > > > build new pipelines, from now on would get the correct behaviour by
> > > default
> > > > without the need of changing some flag in their
> > > > configuration. It feels weird to me, especially for people very
> > familiar
> > > > with standard SQL, to be obliged to set some config
> > > > flag, to be able to get the correct behaviour for CAST. On top, users
> > > that
> > > > won't read about this option in our docs, will,
> > > > "blindly", experience the old incorrect behaviour for their new
> > > pipelines,
> > > > and issues that could cause the CAST to fail
> > > > will remain hidden from them, since *nulls* would be emitted. IMHO,
> > this
> > > > last part is also very important during the development
> > > > stages of an application/pipeline. Normally, developers would want to
> > see
> > > > all possible errors/scenarios during development
> > > > stages , in order to build a robust production system. If errors, are
> > > > hidden then, they can easily end up with those errors
> > > > in the production system which would be even harder to discover and
> > > debug,
> > > > since no exception will ever be thrown.
> > > > Imagine that there is a CAST which generates nulls because of runtime
> > > > errors, and its result is used in an aggregate function:
> > > >
> > > > SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> > > >
> > > > If nulls are emitted by the CAST (let's say because some records
> have a
> > > > quoted string value for col1, then simply the AVG result
> > > > would be wrong and in would be extremely difficult to realise and fix
> > the
> > > > issue by simply wrapping col1 first with, for example,
> > > > a REGEXP_REPLACE, to get rid of the quotes.
> > > >
> > > > My second argument is that, since we have marked
> > > > *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> > > completely
> > > > remove it in the future, if the default value is *DISABLED*, when
> it's
> > > > removed we also make a breaking change, by changing the
> > > > default behaviour for all users, which is against the common software
> > > > practices. You introduce a deprecated flag to help users
> > > > using old versions of the software to smoothly transition to the new
> > > > version, while the new users experience the new features/behaviour,
> > > > without the need to set a flag. So, when in the future this flag is
> > > > completely removed, for those "new" users it would be completely
> > > > transparent. If we continue with having the default value *ENABLED*,
> > > those
> > > > new user would experience an "unnatural" breaking change
> > > > when this option is completely removed.
> > > >
> > > > Best,
> > > > Marios
> > > >
> > >
> >
> >
> > --
> > Marios
> >
>

Re: [DISCUSS] CAST legacy behaviour

Posted by Martijn Visser <ma...@ververica.com>.
+1 for setting this option to disabled by default. I believe failures
should be brought forward as soon as possible, so they can be fixed as fast
as possible. It will also be less confusing for new users. Last but not
least, I believe the impact on existing users will be minimal (since it can
be changed by changing one flag).

Best regards,

Martijn

On Tue, 22 Feb 2022 at 17:55, Marios Trivyzas <ma...@gmail.com> wrote:

> Thanks Francesco,
>
> The two arguments you posted, further strengthen the need to make it
> DISABLED by default.
>
> On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
> francesco@ververica.com> wrote:
>
> > Hi all,
> > I'm +1 with what everything you said Marios.
> >
> > I'm gonna add another argument on top of that: the "legacy-cast-behavior"
> > has also a broken type inference, leading to incorrect results or further
> > errors down in the pipeline[1]. For example, take this:
> >
> > SELECT COALESCE(CAST('a' AS INT), 0) ...
> >
> > With the legacy cast behavior ENABLED, this is going to lead to the wrong
> > result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value is
> > inferred as INT NOT NULL, so the planner will drop COALESCE, and will
> never
> > return 0. Essentially, CAST will infer the wrong nullability leading to
> > assigning its result to a NOT NULL type, when its value can effectively
> be
> > NULL.
> >
> > > You introduce a deprecated flag to help users
> > using old versions of the software to smoothly transition to the new
> > version, while the new users experience the new features/behavior,
> > without the need to set a flag.
> >
> > This is IMO the major point on why we should disable the legacy cast
> > behavior by default. This is even more relevant with 1.15 and the upgrade
> > story, as per the problem described above, because users will now end up
> > generating plans with wrong type inference, which will be hard to migrate
> > in the next flink versions.
> >
> > FG
> >
> > [1] In case you're wondering why it wasn't fixed, the reason is that
> fixing
> > it means creating a breaking change, for details
> > https://github.com/apache/flink/pull/18611#issuecomment-1028174877
> >
> >
> > On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com>
> wrote:
> >
> > > Hello all!
> > >
> > > I would like to bring back the discussion regarding the
> > > *table.exec.legacy-cast-behaviour*
> > > configuration option which we are introducing with Flink *1.15*. This
> > > option provides the users
> > > with the flexibility to continue using the old (incorrect, according to
> > SQL
> > > standards) behaviour
> > > of *CAST.*
> > >
> > > With Flink *1.15* we have introduced a bunch of fixes, improvements and
> > new
> > > casting functionality
> > > between types, see https://issues.apache.org/jira/browse/FLINK-24403,
> > and
> > > some of them are
> > > guarded behind the legacy behaviour option:
> > >
> > >    - Trimming and padding when casting to CHAR/VARCHAR types to respect
> > the
> > >    specified length
> > >    - Changes for casting various types to CHAR/VARCHAR/STRING
> > >    - Runtime errors for CAST no longer emit *null *as result but
> > exceptions
> > >    are thrown with a meaningful message for the cause, that fail the
> > > pipeline. *TRY_CAST
> > >    *is introduced instead, which emits *null* results instead of
> throwing
> > >    exceptions.
> > >
> > > Those changes become active if users set the
> > > *table.exec.legacy-cast-behaviour
> > > *option to *DISABLED*, otherwise
> > > they will continue to experience the old, *erroneous*, behaviour of
> > *CAST*.
> > >
> > > Currently, we have set the *table.exec.legacy-cast-behaviour *option
> > > to be *ENABLED
> > > *by default, so if users want
> > > to get the new correct behaviour, they are required to set explicitly
> the
> > > option to *DISABLED*. Moreover, the option
> > > itself is marked as deprecated, since the plan is to be removed in the
> > > future, so that the old, erroneous behaviour
> > > won't be an option, and the *TRY_CAST* would be the way to go if users
> > > don't want to have errors and failed pipelines,
> > > but have *null*s emitted in case of runtime errors when casting.
> > >
> > > I would like to start a discussion and maybe ask for voting, so that we
> > set
> > > the *table.exec.legacy-cast-behaviour* option
> > > to *DISABLED *by default, so that users that want to keep their old
> > > pipelines working the same way, without changing their
> > > SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> > >
> > > My main argument for changing the default value for the option, is that
> > the
> > > *DISABLED* value is the one that enables
> > > the *correct* behaviour for CAST which should be the default for all
> new
> > > users. This way, new FLINK users, or users which
> > > build new pipelines, from now on would get the correct behaviour by
> > default
> > > without the need of changing some flag in their
> > > configuration. It feels weird to me, especially for people very
> familiar
> > > with standard SQL, to be obliged to set some config
> > > flag, to be able to get the correct behaviour for CAST. On top, users
> > that
> > > won't read about this option in our docs, will,
> > > "blindly", experience the old incorrect behaviour for their new
> > pipelines,
> > > and issues that could cause the CAST to fail
> > > will remain hidden from them, since *nulls* would be emitted. IMHO,
> this
> > > last part is also very important during the development
> > > stages of an application/pipeline. Normally, developers would want to
> see
> > > all possible errors/scenarios during development
> > > stages , in order to build a robust production system. If errors, are
> > > hidden then, they can easily end up with those errors
> > > in the production system which would be even harder to discover and
> > debug,
> > > since no exception will ever be thrown.
> > > Imagine that there is a CAST which generates nulls because of runtime
> > > errors, and its result is used in an aggregate function:
> > >
> > > SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> > >
> > > If nulls are emitted by the CAST (let's say because some records have a
> > > quoted string value for col1, then simply the AVG result
> > > would be wrong and in would be extremely difficult to realise and fix
> the
> > > issue by simply wrapping col1 first with, for example,
> > > a REGEXP_REPLACE, to get rid of the quotes.
> > >
> > > My second argument is that, since we have marked
> > > *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> > completely
> > > remove it in the future, if the default value is *DISABLED*, when it's
> > > removed we also make a breaking change, by changing the
> > > default behaviour for all users, which is against the common software
> > > practices. You introduce a deprecated flag to help users
> > > using old versions of the software to smoothly transition to the new
> > > version, while the new users experience the new features/behaviour,
> > > without the need to set a flag. So, when in the future this flag is
> > > completely removed, for those "new" users it would be completely
> > > transparent. If we continue with having the default value *ENABLED*,
> > those
> > > new user would experience an "unnatural" breaking change
> > > when this option is completely removed.
> > >
> > > Best,
> > > Marios
> > >
> >
>
>
> --
> Marios
>

Re: [DISCUSS] CAST legacy behaviour

Posted by Marios Trivyzas <ma...@gmail.com>.
Thanks Francesco,

The two arguments you posted, further strengthen the need to make it
DISABLED by default.

On Tue, Feb 22, 2022 at 12:10 PM Francesco Guardiani <
francesco@ververica.com> wrote:

> Hi all,
> I'm +1 with what everything you said Marios.
>
> I'm gonna add another argument on top of that: the "legacy-cast-behavior"
> has also a broken type inference, leading to incorrect results or further
> errors down in the pipeline[1]. For example, take this:
>
> SELECT COALESCE(CAST('a' AS INT), 0) ...
>
> With the legacy cast behavior ENABLED, this is going to lead to the wrong
> result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value is
> inferred as INT NOT NULL, so the planner will drop COALESCE, and will never
> return 0. Essentially, CAST will infer the wrong nullability leading to
> assigning its result to a NOT NULL type, when its value can effectively be
> NULL.
>
> > You introduce a deprecated flag to help users
> using old versions of the software to smoothly transition to the new
> version, while the new users experience the new features/behavior,
> without the need to set a flag.
>
> This is IMO the major point on why we should disable the legacy cast
> behavior by default. This is even more relevant with 1.15 and the upgrade
> story, as per the problem described above, because users will now end up
> generating plans with wrong type inference, which will be hard to migrate
> in the next flink versions.
>
> FG
>
> [1] In case you're wondering why it wasn't fixed, the reason is that fixing
> it means creating a breaking change, for details
> https://github.com/apache/flink/pull/18611#issuecomment-1028174877
>
>
> On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com> wrote:
>
> > Hello all!
> >
> > I would like to bring back the discussion regarding the
> > *table.exec.legacy-cast-behaviour*
> > configuration option which we are introducing with Flink *1.15*. This
> > option provides the users
> > with the flexibility to continue using the old (incorrect, according to
> SQL
> > standards) behaviour
> > of *CAST.*
> >
> > With Flink *1.15* we have introduced a bunch of fixes, improvements and
> new
> > casting functionality
> > between types, see https://issues.apache.org/jira/browse/FLINK-24403,
> and
> > some of them are
> > guarded behind the legacy behaviour option:
> >
> >    - Trimming and padding when casting to CHAR/VARCHAR types to respect
> the
> >    specified length
> >    - Changes for casting various types to CHAR/VARCHAR/STRING
> >    - Runtime errors for CAST no longer emit *null *as result but
> exceptions
> >    are thrown with a meaningful message for the cause, that fail the
> > pipeline. *TRY_CAST
> >    *is introduced instead, which emits *null* results instead of throwing
> >    exceptions.
> >
> > Those changes become active if users set the
> > *table.exec.legacy-cast-behaviour
> > *option to *DISABLED*, otherwise
> > they will continue to experience the old, *erroneous*, behaviour of
> *CAST*.
> >
> > Currently, we have set the *table.exec.legacy-cast-behaviour *option
> > to be *ENABLED
> > *by default, so if users want
> > to get the new correct behaviour, they are required to set explicitly the
> > option to *DISABLED*. Moreover, the option
> > itself is marked as deprecated, since the plan is to be removed in the
> > future, so that the old, erroneous behaviour
> > won't be an option, and the *TRY_CAST* would be the way to go if users
> > don't want to have errors and failed pipelines,
> > but have *null*s emitted in case of runtime errors when casting.
> >
> > I would like to start a discussion and maybe ask for voting, so that we
> set
> > the *table.exec.legacy-cast-behaviour* option
> > to *DISABLED *by default, so that users that want to keep their old
> > pipelines working the same way, without changing their
> > SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
> >
> > My main argument for changing the default value for the option, is that
> the
> > *DISABLED* value is the one that enables
> > the *correct* behaviour for CAST which should be the default for all new
> > users. This way, new FLINK users, or users which
> > build new pipelines, from now on would get the correct behaviour by
> default
> > without the need of changing some flag in their
> > configuration. It feels weird to me, especially for people very familiar
> > with standard SQL, to be obliged to set some config
> > flag, to be able to get the correct behaviour for CAST. On top, users
> that
> > won't read about this option in our docs, will,
> > "blindly", experience the old incorrect behaviour for their new
> pipelines,
> > and issues that could cause the CAST to fail
> > will remain hidden from them, since *nulls* would be emitted. IMHO, this
> > last part is also very important during the development
> > stages of an application/pipeline. Normally, developers would want to see
> > all possible errors/scenarios during development
> > stages , in order to build a robust production system. If errors, are
> > hidden then, they can easily end up with those errors
> > in the production system which would be even harder to discover and
> debug,
> > since no exception will ever be thrown.
> > Imagine that there is a CAST which generates nulls because of runtime
> > errors, and its result is used in an aggregate function:
> >
> > SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
> >
> > If nulls are emitted by the CAST (let's say because some records have a
> > quoted string value for col1, then simply the AVG result
> > would be wrong and in would be extremely difficult to realise and fix the
> > issue by simply wrapping col1 first with, for example,
> > a REGEXP_REPLACE, to get rid of the quotes.
> >
> > My second argument is that, since we have marked
> > *table.exec.legacy-cast-behaviour* as deprecated, and we want to
> completely
> > remove it in the future, if the default value is *DISABLED*, when it's
> > removed we also make a breaking change, by changing the
> > default behaviour for all users, which is against the common software
> > practices. You introduce a deprecated flag to help users
> > using old versions of the software to smoothly transition to the new
> > version, while the new users experience the new features/behaviour,
> > without the need to set a flag. So, when in the future this flag is
> > completely removed, for those "new" users it would be completely
> > transparent. If we continue with having the default value *ENABLED*,
> those
> > new user would experience an "unnatural" breaking change
> > when this option is completely removed.
> >
> > Best,
> > Marios
> >
>


-- 
Marios

Re: [DISCUSS] CAST legacy behaviour

Posted by Francesco Guardiani <fr...@ververica.com>.
Hi all,
I'm +1 with what everything you said Marios.

I'm gonna add another argument on top of that: the "legacy-cast-behavior"
has also a broken type inference, leading to incorrect results or further
errors down in the pipeline[1]. For example, take this:

SELECT COALESCE(CAST('a' AS INT), 0) ...

With the legacy cast behavior ENABLED, this is going to lead to the wrong
result, as 'a' is inferred as VARCHAR NOT NULL, the CAST return value is
inferred as INT NOT NULL, so the planner will drop COALESCE, and will never
return 0. Essentially, CAST will infer the wrong nullability leading to
assigning its result to a NOT NULL type, when its value can effectively be
NULL.

> You introduce a deprecated flag to help users
using old versions of the software to smoothly transition to the new
version, while the new users experience the new features/behavior,
without the need to set a flag.

This is IMO the major point on why we should disable the legacy cast
behavior by default. This is even more relevant with 1.15 and the upgrade
story, as per the problem described above, because users will now end up
generating plans with wrong type inference, which will be hard to migrate
in the next flink versions.

FG

[1] In case you're wondering why it wasn't fixed, the reason is that fixing
it means creating a breaking change, for details
https://github.com/apache/flink/pull/18611#issuecomment-1028174877


On Tue, Feb 22, 2022 at 10:07 AM Marios Trivyzas <ma...@gmail.com> wrote:

> Hello all!
>
> I would like to bring back the discussion regarding the
> *table.exec.legacy-cast-behaviour*
> configuration option which we are introducing with Flink *1.15*. This
> option provides the users
> with the flexibility to continue using the old (incorrect, according to SQL
> standards) behaviour
> of *CAST.*
>
> With Flink *1.15* we have introduced a bunch of fixes, improvements and new
> casting functionality
> between types, see https://issues.apache.org/jira/browse/FLINK-24403, and
> some of them are
> guarded behind the legacy behaviour option:
>
>    - Trimming and padding when casting to CHAR/VARCHAR types to respect the
>    specified length
>    - Changes for casting various types to CHAR/VARCHAR/STRING
>    - Runtime errors for CAST no longer emit *null *as result but exceptions
>    are thrown with a meaningful message for the cause, that fail the
> pipeline. *TRY_CAST
>    *is introduced instead, which emits *null* results instead of throwing
>    exceptions.
>
> Those changes become active if users set the
> *table.exec.legacy-cast-behaviour
> *option to *DISABLED*, otherwise
> they will continue to experience the old, *erroneous*, behaviour of *CAST*.
>
> Currently, we have set the *table.exec.legacy-cast-behaviour *option
> to be *ENABLED
> *by default, so if users want
> to get the new correct behaviour, they are required to set explicitly the
> option to *DISABLED*. Moreover, the option
> itself is marked as deprecated, since the plan is to be removed in the
> future, so that the old, erroneous behaviour
> won't be an option, and the *TRY_CAST* would be the way to go if users
> don't want to have errors and failed pipelines,
> but have *null*s emitted in case of runtime errors when casting.
>
> I would like to start a discussion and maybe ask for voting, so that we set
> the *table.exec.legacy-cast-behaviour* option
> to *DISABLED *by default, so that users that want to keep their old
> pipelines working the same way, without changing their
> SQL/TableAPI code, would need to explicitly set it to *ENABLED.*
>
> My main argument for changing the default value for the option, is that the
> *DISABLED* value is the one that enables
> the *correct* behaviour for CAST which should be the default for all new
> users. This way, new FLINK users, or users which
> build new pipelines, from now on would get the correct behaviour by default
> without the need of changing some flag in their
> configuration. It feels weird to me, especially for people very familiar
> with standard SQL, to be obliged to set some config
> flag, to be able to get the correct behaviour for CAST. On top, users that
> won't read about this option in our docs, will,
> "blindly", experience the old incorrect behaviour for their new pipelines,
> and issues that could cause the CAST to fail
> will remain hidden from them, since *nulls* would be emitted. IMHO, this
> last part is also very important during the development
> stages of an application/pipeline. Normally, developers would want to see
> all possible errors/scenarios during development
> stages , in order to build a robust production system. If errors, are
> hidden then, they can easily end up with those errors
> in the production system which would be even harder to discover and debug,
> since no exception will ever be thrown.
> Imagine that there is a CAST which generates nulls because of runtime
> errors, and its result is used in an aggregate function:
>
> SELECT AVG(CAST(col1 AS FLOAT)), ... FROM t GROUP BY ....
>
> If nulls are emitted by the CAST (let's say because some records have a
> quoted string value for col1, then simply the AVG result
> would be wrong and in would be extremely difficult to realise and fix the
> issue by simply wrapping col1 first with, for example,
> a REGEXP_REPLACE, to get rid of the quotes.
>
> My second argument is that, since we have marked
> *table.exec.legacy-cast-behaviour* as deprecated, and we want to completely
> remove it in the future, if the default value is *DISABLED*, when it's
> removed we also make a breaking change, by changing the
> default behaviour for all users, which is against the common software
> practices. You introduce a deprecated flag to help users
> using old versions of the software to smoothly transition to the new
> version, while the new users experience the new features/behaviour,
> without the need to set a flag. So, when in the future this flag is
> completely removed, for those "new" users it would be completely
> transparent. If we continue with having the default value *ENABLED*, those
> new user would experience an "unnatural" breaking change
> when this option is completely removed.
>
> Best,
> Marios
>