You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Xingcan Cui <xi...@gmail.com> on 2021/08/27 22:21:40 UTC

Support decimal types with larger precisions

Hi all,

Recently, I was trying to load some CDC data from Oracle/Postgres databases
and found that the current precision range [1, 38] for DecimalType may not
meet the requirement for some source types. For instance, in Oracle, if a
column is declared as `NUMBER` without precision and scale, the values in
it could potentially be very large. As DecimalType is backed by Java
BigDecimal, I wonder if we should extend the precision range.

Best,
Xingcan

Re: Support decimal types with larger precisions

Posted by Xingcan Cui <xi...@gmail.com>.
Hi Jark and Jingsong,

Thanks for your reply! Since modifying the SQL type system needs a lot of
work, I agree that we should postpone this until we get more requests from
users.

For my own case, according to the domain knowledge, I think a precision of
38 would be enough (though the fields were declared without any precision
constraints). A user-defined numeric type converter would solve the problem!

Thanks,
Xingcan

On Mon, Aug 30, 2021 at 11:46 PM Jingsong Li <ji...@gmail.com> wrote:

> Hi Xingcan,
>
> As a workaround, can we convert large decimal to varchar?
>
> If Flink SQL wants to support large decimal, we should investigate
> other big data and databases. As Jark said, this needs a lot of work.
>
> Best,
> Jingsong Lee
>
> On Tue, Aug 31, 2021 at 11:16 AM Jark Wu <im...@gmail.com> wrote:
> >
> > Hi Xingcan, Timo,
> >
> > Yes, flink-cdc-connector and JDBC connector also don't support larger
> > precision or no precision.
> > However, we didn't receive any users reporting this problem.
> > Maybe it is not very common that precision is higher than 38 or without
> > precision.
> >
> > I think it makes sense to support this use case, but this definitely
> needs
> > a lot of work,
> > and we need more investigation and discussion (maybe a new type?)
> >
> > Best,
> > Jark
> >
> >
> > On Mon, 30 Aug 2021 at 23:32, Xingcan Cui <xi...@gmail.com> wrote:
> >
> > > Hi Timo,
> > >
> > > Though it's an extreme case, I still think this is a hard blocker if we
> > > would ingest data from an RDBMS (and other systems supporting large
> > > precision numbers).
> > >
> > > The tricky part is that users can declare numeric types without any
> > > precision and scale restrictions in RDBMS (e.g., NUMBER in Oracle[1]),
> but
> > > in Flink, we must explicitly specify the precision and scale.
> > >
> > > Cc Jark, do you think this is a problem for flink-cdc-connectors?
> > >
> > > Best,
> > > Xingcan
> > >
> > > [1]
> > >
> https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT313
> > >
> > > On Mon, Aug 30, 2021 at 4:12 AM Timo Walther <tw...@apache.org>
> wrote:
> > >
> > >> Hi Xingcan,
> > >>
> > >> in theory there should be no hard blocker for supporting this. The
> > >> implementation should be flexible enough at most locations. We just
> > >> adopted 38 from the Blink code base which adopted it from Hive.
> > >>
> > >> However, this could be a breaking change for existing pipelines and we
> > >> would need to offer a flag to bring back the old behavior. It would
> > >> definitely lead to a lot of testing work to not cause inconsistencies.
> > >>
> > >> Do you think this is a hard blocker for users?
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 28.08.21 00:21, Xingcan Cui wrote:
> > >> > Hi all,
> > >> >
> > >> > Recently, I was trying to load some CDC data from Oracle/Postgres
> > >> databases
> > >> > and found that the current precision range [1, 38] for DecimalType
> may
> > >> not
> > >> > meet the requirement for some source types. For instance, in
> Oracle, if
> > >> a
> > >> > column is declared as `NUMBER` without precision and scale, the
> values
> > >> in
> > >> > it could potentially be very large. As DecimalType is backed by Java
> > >> > BigDecimal, I wonder if we should extend the precision range.
> > >> >
> > >> > Best,
> > >> > Xingcan
> > >> >
> > >>
> > >>
>
>
>
> --
> Best, Jingsong Lee
>

Re: Support decimal types with larger precisions

Posted by Jingsong Li <ji...@gmail.com>.
Hi Xingcan,

As a workaround, can we convert large decimal to varchar?

If Flink SQL wants to support large decimal, we should investigate
other big data and databases. As Jark said, this needs a lot of work.

Best,
Jingsong Lee

On Tue, Aug 31, 2021 at 11:16 AM Jark Wu <im...@gmail.com> wrote:
>
> Hi Xingcan, Timo,
>
> Yes, flink-cdc-connector and JDBC connector also don't support larger
> precision or no precision.
> However, we didn't receive any users reporting this problem.
> Maybe it is not very common that precision is higher than 38 or without
> precision.
>
> I think it makes sense to support this use case, but this definitely needs
> a lot of work,
> and we need more investigation and discussion (maybe a new type?)
>
> Best,
> Jark
>
>
> On Mon, 30 Aug 2021 at 23:32, Xingcan Cui <xi...@gmail.com> wrote:
>
> > Hi Timo,
> >
> > Though it's an extreme case, I still think this is a hard blocker if we
> > would ingest data from an RDBMS (and other systems supporting large
> > precision numbers).
> >
> > The tricky part is that users can declare numeric types without any
> > precision and scale restrictions in RDBMS (e.g., NUMBER in Oracle[1]), but
> > in Flink, we must explicitly specify the precision and scale.
> >
> > Cc Jark, do you think this is a problem for flink-cdc-connectors?
> >
> > Best,
> > Xingcan
> >
> > [1]
> > https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT313
> >
> > On Mon, Aug 30, 2021 at 4:12 AM Timo Walther <tw...@apache.org> wrote:
> >
> >> Hi Xingcan,
> >>
> >> in theory there should be no hard blocker for supporting this. The
> >> implementation should be flexible enough at most locations. We just
> >> adopted 38 from the Blink code base which adopted it from Hive.
> >>
> >> However, this could be a breaking change for existing pipelines and we
> >> would need to offer a flag to bring back the old behavior. It would
> >> definitely lead to a lot of testing work to not cause inconsistencies.
> >>
> >> Do you think this is a hard blocker for users?
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 28.08.21 00:21, Xingcan Cui wrote:
> >> > Hi all,
> >> >
> >> > Recently, I was trying to load some CDC data from Oracle/Postgres
> >> databases
> >> > and found that the current precision range [1, 38] for DecimalType may
> >> not
> >> > meet the requirement for some source types. For instance, in Oracle, if
> >> a
> >> > column is declared as `NUMBER` without precision and scale, the values
> >> in
> >> > it could potentially be very large. As DecimalType is backed by Java
> >> > BigDecimal, I wonder if we should extend the precision range.
> >> >
> >> > Best,
> >> > Xingcan
> >> >
> >>
> >>



-- 
Best, Jingsong Lee

Re: Support decimal types with larger precisions

Posted by Jark Wu <im...@gmail.com>.
Hi Xingcan, Timo,

Yes, flink-cdc-connector and JDBC connector also don't support larger
precision or no precision.
However, we didn't receive any users reporting this problem.
Maybe it is not very common that precision is higher than 38 or without
precision.

I think it makes sense to support this use case, but this definitely needs
a lot of work,
and we need more investigation and discussion (maybe a new type?)

Best,
Jark


On Mon, 30 Aug 2021 at 23:32, Xingcan Cui <xi...@gmail.com> wrote:

> Hi Timo,
>
> Though it's an extreme case, I still think this is a hard blocker if we
> would ingest data from an RDBMS (and other systems supporting large
> precision numbers).
>
> The tricky part is that users can declare numeric types without any
> precision and scale restrictions in RDBMS (e.g., NUMBER in Oracle[1]), but
> in Flink, we must explicitly specify the precision and scale.
>
> Cc Jark, do you think this is a problem for flink-cdc-connectors?
>
> Best,
> Xingcan
>
> [1]
> https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT313
>
> On Mon, Aug 30, 2021 at 4:12 AM Timo Walther <tw...@apache.org> wrote:
>
>> Hi Xingcan,
>>
>> in theory there should be no hard blocker for supporting this. The
>> implementation should be flexible enough at most locations. We just
>> adopted 38 from the Blink code base which adopted it from Hive.
>>
>> However, this could be a breaking change for existing pipelines and we
>> would need to offer a flag to bring back the old behavior. It would
>> definitely lead to a lot of testing work to not cause inconsistencies.
>>
>> Do you think this is a hard blocker for users?
>>
>> Regards,
>> Timo
>>
>>
>> On 28.08.21 00:21, Xingcan Cui wrote:
>> > Hi all,
>> >
>> > Recently, I was trying to load some CDC data from Oracle/Postgres
>> databases
>> > and found that the current precision range [1, 38] for DecimalType may
>> not
>> > meet the requirement for some source types. For instance, in Oracle, if
>> a
>> > column is declared as `NUMBER` without precision and scale, the values
>> in
>> > it could potentially be very large. As DecimalType is backed by Java
>> > BigDecimal, I wonder if we should extend the precision range.
>> >
>> > Best,
>> > Xingcan
>> >
>>
>>

Re: Support decimal types with larger precisions

Posted by Xingcan Cui <xi...@gmail.com>.
Hi Timo,

Though it's an extreme case, I still think this is a hard blocker if we
would ingest data from an RDBMS (and other systems supporting large
precision numbers).

The tricky part is that users can declare numeric types without any
precision and scale restrictions in RDBMS (e.g., NUMBER in Oracle[1]), but
in Flink, we must explicitly specify the precision and scale.

Cc Jark, do you think this is a problem for flink-cdc-connectors?

Best,
Xingcan

[1]
https://docs.oracle.com/cd/B28359_01/server.111/b28318/datatype.htm#CNCPT313

On Mon, Aug 30, 2021 at 4:12 AM Timo Walther <tw...@apache.org> wrote:

> Hi Xingcan,
>
> in theory there should be no hard blocker for supporting this. The
> implementation should be flexible enough at most locations. We just
> adopted 38 from the Blink code base which adopted it from Hive.
>
> However, this could be a breaking change for existing pipelines and we
> would need to offer a flag to bring back the old behavior. It would
> definitely lead to a lot of testing work to not cause inconsistencies.
>
> Do you think this is a hard blocker for users?
>
> Regards,
> Timo
>
>
> On 28.08.21 00:21, Xingcan Cui wrote:
> > Hi all,
> >
> > Recently, I was trying to load some CDC data from Oracle/Postgres
> databases
> > and found that the current precision range [1, 38] for DecimalType may
> not
> > meet the requirement for some source types. For instance, in Oracle, if a
> > column is declared as `NUMBER` without precision and scale, the values in
> > it could potentially be very large. As DecimalType is backed by Java
> > BigDecimal, I wonder if we should extend the precision range.
> >
> > Best,
> > Xingcan
> >
>
>

Re: Support decimal types with larger precisions

Posted by Timo Walther <tw...@apache.org>.
Hi Xingcan,

in theory there should be no hard blocker for supporting this. The 
implementation should be flexible enough at most locations. We just 
adopted 38 from the Blink code base which adopted it from Hive.

However, this could be a breaking change for existing pipelines and we 
would need to offer a flag to bring back the old behavior. It would 
definitely lead to a lot of testing work to not cause inconsistencies.

Do you think this is a hard blocker for users?

Regards,
Timo


On 28.08.21 00:21, Xingcan Cui wrote:
> Hi all,
> 
> Recently, I was trying to load some CDC data from Oracle/Postgres databases
> and found that the current precision range [1, 38] for DecimalType may not
> meet the requirement for some source types. For instance, in Oracle, if a
> column is declared as `NUMBER` without precision and scale, the values in
> it could potentially be very large. As DecimalType is backed by Java
> BigDecimal, I wonder if we should extend the precision range.
> 
> Best,
> Xingcan
>