You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Andy Coates (JIRA)" <ji...@apache.org> on 2018/06/26 11:42:00 UTC

[jira] [Commented] (AVRO-2164) Make Decimal a first class type.

    [ https://issues.apache.org/jira/browse/AVRO-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523603#comment-16523603 ] 

Andy Coates commented on AVRO-2164:
-----------------------------------

I guess the main issue that I see is that the current decimal implementations in Avro doesn't allow for schema evolution where the scale changes. Consider a use-case where a schema has a decimal field with scale 4. Then at some point later it comes to light that 4 is too little, and a scale of 5 is needed, (or it's too much and is wasting space and only 3 is needed). With the current implementation, if the scale is changed, then reading the a record serialised with the old schema using the new schema as a read-schema will result in the wrong value. This, IMHO, is a big issue. This is data corruption.

Possible ways to solve this:
 - don't allow scale changes, i.e. read/write schemas with different scale should be considered incompatible.
 - covert values on the fly, i.e. deserialize the decimal using the old scale and attempt to set the new scale. Where the new scale is larger this will always work. Where the new scale is smaller this may through an exception.  (It may also make sense to allow the user to define rounding behaviour in the case where the scale has reduced).
 - encode the scale in the serialised form, i.e. create a new type, (either first class or logical), where the serialised form is prefixed with the scale.

Of these, maybe the best is to convert on the fly?

> Make Decimal a first class type.
> --------------------------------
>
>                 Key: AVRO-2164
>                 URL: https://issues.apache.org/jira/browse/AVRO-2164
>             Project: Avro
>          Issue Type: Improvement
>          Components: logical types
>    Affects Versions: 1.8.2
>            Reporter: Andy Coates
>            Priority: Major
>
> I'd be interested to hear the communities thoughts on making decimal a first class type. 
> The current logical type encodes a decimal into a _bytes_ or _fixed_. This encoding does not include any information about the scale, i.e. this encoding is lossy. 
> There are open issues around the compatibility / evolvability of schemas containing decimal logical types, (e.g. AVRO-2078 & AVRO-1721), that mean reading data that was previously written with a different scale will result in data corruption.
> If these issues were fixed, with suitable compatibility checks put in place, this would then make it impossible to evolve an Avro schema where the scale needs to be changed. This inability to evolve the scale is very restrictive, and can result in high overhead for organizations that _need_ to change the scale, i.e. they may potentially need to copy their entire data set, deserializing with the old scale and re-serializing with the new.
> If _decimal_ were promoted to a first class type, this would allow the scale to be captured in the serialized form, allow for schema evolution support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

RE: [jira] [Commented] (AVRO-2164) Make Decimal a first class type.

Posted by Frédéric SOUCHU <Fr...@ingenico.com>.
I second the "encode the scale in the serialised form".
The scale should be a regular property, not a remnant of database fixed size columns!

FYI, in my project I had to create a 'ScaledFloat' AVRO type to overcome Java/C# serialization incompatibility.
The scale is a regular type attribute, allowing various data producers with various data precision needs to use the same schema.

-----Original Message-----
From: Andy Coates (JIRA) <ji...@apache.org>
Sent: 26 June 2018 13:42
To: dev@avro.apache.org
Subject: [jira] [Commented] (AVRO-2164) Make Decimal a first class type.


    [ https://issues.apache.org/jira/browse/AVRO-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16523603#comment-16523603 ]

Andy Coates commented on AVRO-2164:
-----------------------------------

I guess the main issue that I see is that the current decimal implementations in Avro doesn't allow for schema evolution where the scale changes. Consider a use-case where a schema has a decimal field with scale 4. Then at some point later it comes to light that 4 is too little, and a scale of 5 is needed, (or it's too much and is wasting space and only 3 is needed). With the current implementation, if the scale is changed, then reading the a record serialised with the old schema using the new schema as a read-schema will result in the wrong value. This, IMHO, is a big issue. This is data corruption.

Possible ways to solve this:
 - don't allow scale changes, i.e. read/write schemas with different scale should be considered incompatible.
 - covert values on the fly, i.e. deserialize the decimal using the old scale and attempt to set the new scale. Where the new scale is larger this will always work. Where the new scale is smaller this may through an exception.  (It may also make sense to allow the user to define rounding behaviour in the case where the scale has reduced).
 - encode the scale in the serialised form, i.e. create a new type, (either first class or logical), where the serialised form is prefixed with the scale.

Of these, maybe the best is to convert on the fly?

> Make Decimal a first class type.
> --------------------------------
>
>                 Key: AVRO-2164
>                 URL: https://issues.apache.org/jira/browse/AVRO-2164
>             Project: Avro
>          Issue Type: Improvement
>          Components: logical types
>    Affects Versions: 1.8.2
>            Reporter: Andy Coates
>            Priority: Major
>
> I'd be interested to hear the communities thoughts on making decimal a
> first class type. The current logical type encodes a decimal into a _bytes_ or _fixed_. This encoding does not include any information about the scale, i.e. this encoding is lossy.
> There are open issues around the compatibility / evolvability of schemas containing decimal logical types, (e.g. AVRO-2078 & AVRO-1721), that mean reading data that was previously written with a different scale will result in data corruption.
> If these issues were fixed, with suitable compatibility checks put in place, this would then make it impossible to evolve an Avro schema where the scale needs to be changed. This inability to evolve the scale is very restrictive, and can result in high overhead for organizations that _need_ to change the scale, i.e. they may potentially need to copy their entire data set, deserializing with the old scale and re-serializing with the new.
> If _decimal_ were promoted to a first class type, this would allow the scale to be captured in the serialized form, allow for schema evolution support.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
This email and its content belong to Ingenico Group. The enclosed information is confidential and may not be disclosed to any unauthorized person. If you have received it by mistake do not forward it and delete it from your system. Cet email et son contenu sont la propriété du Groupe Ingenico. L’information qu’il contient est confidentielle et ne peut être communiquée à des personnes non autorisées. Si vous l’avez reçu par erreur ne le transférez pas et supprimez-le.