You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Christophe Le Saëc <ch...@gmail.com> on 2023/09/18 08:19:18 UTC

New Big Decimal Logical type

Hello,

This JIRA ticket <https://issues.apache.org/jira/browse/AVRO-3779> would
introduce a new Big-Decimal logical type where precision and scale are not
given in the type definition, all is embedded in the value (this new
logical type does not replace current one). This is useful when you need
bigdecimal without knowing anything about the values when defining the avro
type.

For the moment, there is only one PR for Java and one for Rust.
So, can we introduce this new feature knowing there is no implementation
for other module (at least for the moment) ?

Regards,
Christophe.

Re: New Big Decimal Logical type

Posted by Christophe Le Saëc <ch...@gmail.com>.
*schema evolution on logical types* : Sounds like a good idea for a future
new feature.

*stringable representation of BigDecimal* : That's indeed not the case even
with current decimal type. This could be added with
fromCharSequence/toCharSequence with current and new decimal type to get
exponential representation as "1.23E-8" for example.

Kind regards,
Christophe

Le mar. 26 sept. 2023 à 17:17, Oscar Westra van Holthe - Kind <
oscar@westravanholthe.nl> a écrit :

> Hi,
>
> As far as a "stringable representation of BigDecimal" is concerned, I
> always thought it would then be modelled as a string instead of a logical
> type.
>
> The difficult part here is how to handle an unknown scale, but with the
> requirement to have exact data. But honestly, I think we don't want that.
>
> I would however, like to see a way to do schema evolution on logical types.
> That would allow (for example) widening conversions on decimals, same as
> for int/float now.
>
>
> Kind regards,
> Oscar
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind <op...@apache.org>
>
> 🌐 https://github.com/opwvhk/
>

Re: New Big Decimal Logical type

Posted by Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>.
Hi,

As far as a "stringable representation of BigDecimal" is concerned, I
always thought it would then be modelled as a string instead of a logical
type.

The difficult part here is how to handle an unknown scale, but with the
requirement to have exact data. But honestly, I think we don't want that.

I would however, like to see a way to do schema evolution on logical types.
That would allow (for example) widening conversions on decimals, same as
for int/float now.


Kind regards,
Oscar

-- 

✉️ Oscar Westra van Holthe - Kind <op...@apache.org>

🌐 https://github.com/opwvhk/

Re: New Big Decimal Logical type

Posted by Christophe Le Saëc <ch...@gmail.com>.
*Historically, the stringable represention of an BigDecimal was used*

Currently, if i use decimal logical type with json encoder; with this java
code :

String schemaStr =
"{\"type\":\"record\",\"name\":\"my_record\",\"doc\":\"doc\","
  + "\"fields\":[{\"name\":\"f\", "
  + "\"type\":{\"type\":\"bytes\",\"logicalType\":\"decimal\",\"precision\":9,\"scale\":2}},"
  + "{\"name\":\"f2\", "
  + "\"type\":{\"type\":\"fixed\", \"name\":\"FIX1\", \"size\": 10,
\"logicalType\":\"decimal\",\"precision\":9,\"scale\":2}}"
  + "]}";
Schema schema = new Schema.Parser().parse(schemaStr);

GenericData.get().addLogicalTypeConversion(new Conversions.DecimalConversion());

GenericData.Record record = new GenericData.Record(schema);
record.put(0, new BigDecimal("117230.150"));
record.put(1, new BigDecimal("-230.150"));

ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema,
outputStream, true);
GenericDatumWriter<Object> writer = new GenericDatumWriter<>(schema);
writer.write(record, encoder);
encoder.flush();
System.out.println(outputStream.toString());

I get this result

{
  "f" : "\u0000²á\u0007",
  "f2" : "ÿÿÿÿÿÿÿÿ¦\u0019"
}

But not a stringable as "f" : "117230.15"


Unit test TestBigDecimalConversion
<https://github.com/apache/avro/blob/4289c51ab95af84d7fdcccca1fa44d7a1e54589c/lang/java/avro/src/test/java/org/apache/avro/TestBigDecimalConversion.java>
shows that it can convert a widely range of BigDecimal without error; same
test* fail with current decimal converter*, because once you chose scale
and precision, it can't be adapted to every number. So, for use case, when
you know roughly the range of values the field will take, it's ok with
current logical type as you can anticipate scale & precision. So, the new
type is adapted when you strictly have no idea of what would be scale and
precision.

For naming (currently "big-decimal"), i'm open to change. It's a Java &
Rust name as SDK of both language contains this type; C# has BigRational,
but with another logic, avro C# module contains "AvroDecimal" class with
error message that contains BigDecimal terms. May be we could still use
"decimal" name in schema definition and use one or the other if precision
or scale parameters are defined or not; but it could also be more confused.


Regards,
Christophe.


Le mer. 20 sept. 2023 à 19:10, Ryan Skraba <ry...@skraba.com> a écrit :

> Hello!  There's a couple things going on, but I think this new type
> needs to be better specified in in the documentation before merging!
>
> Specifically, how are the bytes constructed and why was this
> representation chosen?  Is there a better, more neutral represention
> where the underlying type could still be useful in languages that
> don't support this logical type? I'd also be interested in finding a
> less "Java" name for arbitrary precision values than BigDecimal if
> possible!
>
> Historically, the stringable represention of an BigDecimal was used
> for these value.  I think we'd have to make an argument that the
> binary representation is more compact and/or efficient than this.
>
> In my experience, there aren't many use cases that require an
> arbitrary precision per value (as opposed to decimal, which sets it
> for the schema), and it's usually because "we don't know the precision
> of our data yet".   As an example, Google Cloud Spanner recommends
> using Strings when a predefined precision and scale is unsatisfactory
> for a column[1].  Is there a better way for us to help the use when
> they don't know the precision of data they expect to work with?
>
> If there's other use cases for arbitrary sized data, maybe we can
> align on a solution!
>
> All my best, Ryan
>
> [1]:
> https://cloud.google.com/spanner/docs/storing-numeric-data#recommendation_store_arbitrary_precision_numbers_as_strings
>
> On Wed, Sep 20, 2023 at 6:19 PM Oscar Westra van Holthe - Kind
> <os...@westravanholthe.nl> wrote:
> >
> > On Wed, 20 Sept 2023 at 15:53, Martin Grigorov <mg...@apache.org>
> wrote:
> >
> > > On Mon, Sep 18, 2023 at 11:21 AM Christophe Le Saëc <
> chlesaec@gmail.com>
> > > wrote:
> > > > This JIRA ticket <https://issues.apache.org/jira/browse/AVRO-3779>
> would
> > > > introduce a new Big-Decimal logical type [...]
> > > >
> > > > For the moment, there is only one PR for Java and one for Rust.
> > > > So, can we introduce this new feature knowing there is no
> implementation
> > > > for other module (at least for the moment) ?
> > >
> > > I am OK with this approach !
> > > Do you think it would be a good idea to mark this new logical type as
> > > experimental in the specification ?
> > >
> >
> > Yes, I do.
> >
> > There are similarities with the "duration" logical type: that is
> available
> > as logical type in most (all?) implementations, but using actual values
> of
> > that type is (AFAIK) only possible in Rust. The Java codebase, for
> example,
> > cannot convert the Avro value.
> >
> > So in all honesty, I think marking it as experimental is not needed. But
> I
> > think we should, because marking new logical types as experimental allows
> > the option to abandon it later on.
> >
> >
> > Kind regards,
> > Oscar
> >
> > --
> >
> > ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>
>

Re: New Big Decimal Logical type

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  There's a couple things going on, but I think this new type
needs to be better specified in in the documentation before merging!

Specifically, how are the bytes constructed and why was this
representation chosen?  Is there a better, more neutral represention
where the underlying type could still be useful in languages that
don't support this logical type? I'd also be interested in finding a
less "Java" name for arbitrary precision values than BigDecimal if
possible!

Historically, the stringable represention of an BigDecimal was used
for these value.  I think we'd have to make an argument that the
binary representation is more compact and/or efficient than this.

In my experience, there aren't many use cases that require an
arbitrary precision per value (as opposed to decimal, which sets it
for the schema), and it's usually because "we don't know the precision
of our data yet".   As an example, Google Cloud Spanner recommends
using Strings when a predefined precision and scale is unsatisfactory
for a column[1].  Is there a better way for us to help the use when
they don't know the precision of data they expect to work with?

If there's other use cases for arbitrary sized data, maybe we can
align on a solution!

All my best, Ryan

[1]: https://cloud.google.com/spanner/docs/storing-numeric-data#recommendation_store_arbitrary_precision_numbers_as_strings

On Wed, Sep 20, 2023 at 6:19 PM Oscar Westra van Holthe - Kind
<os...@westravanholthe.nl> wrote:
>
> On Wed, 20 Sept 2023 at 15:53, Martin Grigorov <mg...@apache.org> wrote:
>
> > On Mon, Sep 18, 2023 at 11:21 AM Christophe Le Saëc <ch...@gmail.com>
> > wrote:
> > > This JIRA ticket <https://issues.apache.org/jira/browse/AVRO-3779> would
> > > introduce a new Big-Decimal logical type [...]
> > >
> > > For the moment, there is only one PR for Java and one for Rust.
> > > So, can we introduce this new feature knowing there is no implementation
> > > for other module (at least for the moment) ?
> >
> > I am OK with this approach !
> > Do you think it would be a good idea to mark this new logical type as
> > experimental in the specification ?
> >
>
> Yes, I do.
>
> There are similarities with the "duration" logical type: that is available
> as logical type in most (all?) implementations, but using actual values of
> that type is (AFAIK) only possible in Rust. The Java codebase, for example,
> cannot convert the Avro value.
>
> So in all honesty, I think marking it as experimental is not needed. But I
> think we should, because marking new logical types as experimental allows
> the option to abandon it later on.
>
>
> Kind regards,
> Oscar
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>

Re: New Big Decimal Logical type

Posted by Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>.
On Wed, 20 Sept 2023 at 15:53, Martin Grigorov <mg...@apache.org> wrote:

> On Mon, Sep 18, 2023 at 11:21 AM Christophe Le Saëc <ch...@gmail.com>
> wrote:
> > This JIRA ticket <https://issues.apache.org/jira/browse/AVRO-3779> would
> > introduce a new Big-Decimal logical type [...]
> >
> > For the moment, there is only one PR for Java and one for Rust.
> > So, can we introduce this new feature knowing there is no implementation
> > for other module (at least for the moment) ?
>
> I am OK with this approach !
> Do you think it would be a good idea to mark this new logical type as
> experimental in the specification ?
>

Yes, I do.

There are similarities with the "duration" logical type: that is available
as logical type in most (all?) implementations, but using actual values of
that type is (AFAIK) only possible in Rust. The Java codebase, for example,
cannot convert the Avro value.

So in all honesty, I think marking it as experimental is not needed. But I
think we should, because marking new logical types as experimental allows
the option to abandon it later on.


Kind regards,
Oscar

-- 

✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>

Re: New Big Decimal Logical type

Posted by Martin Grigorov <mg...@apache.org>.
HI,

On Mon, Sep 18, 2023 at 11:21 AM Christophe Le Saëc <ch...@gmail.com>
wrote:

> Hello,
>
> This JIRA ticket <https://issues.apache.org/jira/browse/AVRO-3779> would
> introduce a new Big-Decimal logical type where precision and scale are not
> given in the type definition, all is embedded in the value (this new
> logical type does not replace current one). This is useful when you need
> bigdecimal without knowing anything about the values when defining the avro
> type.
>
> For the moment, there is only one PR for Java and one for Rust.
> So, can we introduce this new feature knowing there is no implementation
> for other module (at least for the moment) ?
>

I am OK with this approach !
Do you think it would be a good idea to mark this new logical type as
experimental in the specification ?



>
> Regards,
> Christophe.
>