You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by "Rik Heijdens (Jira)" <ji...@apache.org> on 2022/09/29 13:02:00 UTC

[jira] [Comment Edited] (AVRO-3631) Fix serialization of structs containing Fixed fields

    [ https://issues.apache.org/jira/browse/AVRO-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17611026#comment-17611026 ] 

Rik Heijdens edited comment on AVRO-3631 at 9/29/22 1:01 PM:
-------------------------------------------------------------

Okay, so I'm starting to understand the issue a bit more, and I added a few more test-cases to the branch that I [linked earlier|https://github.com/apache/avro/compare/master...privacy-com:avro:avro-3631/fix-fixed-serialization?expand=1].

Unlike what I initially thought, the compatibility problems with `Value::Fixed` do not appear to be isolated to serialization. It also affects Deserialization of a `Value::Record` wrapping a `Value::Fixed` into a Rust struct. I added a test-case in [12ef14b|https://github.com/apache/avro/commit/12ef14b6a5cc102bcc0317251cd37471148d4926] which illustrates this.

I did note however that this is consistent with the Serialization implementation: a Rust `[u8; 6]` is serialized into a `Value::Array<Value::Int>` as illustrated by [a31fcfc|https://github.com/apache/avro/commit/a31fcfc96493e3180490dec5622cab485bf7cd79].

However, I am unsure as to how we should move forward with this: at serialization time the Schema information is not available to the Serializer and thus it wouldn't know if we were expecting to serialize to `Value::Array<Value::Int>` or `Value::Fixed`.

I'll ponder on this for a bit, but would appreciate suggestions if you have any on how we can move forward with this [~mgrigorov]


was (Author: JIRAUSER293264):
Okay, so I'm starting to understand the issue a bit more, and I added a few more test-cases to the branch that I [linked earlier|https://github.com/apache/avro/compare/master...privacy-com:avro:avro-3631/fix-fixed-serialization?expand=1].

Unlike what I initially thought, the compatibility problems with `Value::Fixed` does not appear to be isolated to serialization. It also affects Deserialization of a `Value::Record` wrapping a `Value::Fixed` into a Rust struct. I added a test-case in [12ef14b|https://github.com/apache/avro/commit/12ef14b6a5cc102bcc0317251cd37471148d4926] which illustrates this.

I did not however that this is consistent with the Serialization implementation: a Rust `[u8; 6]` is serialized into a `Value::Array<Value::Int>` as illustrated by [a31fcfc|https://github.com/apache/avro/commit/a31fcfc96493e3180490dec5622cab485bf7cd79].

However, I am unsure as to how we should move forward with this: at serialization time the Schema information is not available to the Serializer and thus it wouldn't know if we were expecting to serialize to `Value::Array<Value::Int>` or `Value::Fixed`.

I'll ponder on this for a bit, but would appreciate suggestions if you have any on how we can move forward with this [~mgrigorov]

> Fix serialization of structs containing Fixed fields
> ----------------------------------------------------
>
>                 Key: AVRO-3631
>                 URL: https://issues.apache.org/jira/browse/AVRO-3631
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: rust
>            Reporter: Rik Heijdens
>            Priority: Major
>
> Consider the following minimal Avro Schema:
> {noformat}
> {
>     "type": "record",
>     "name": "TestStructFixedField",
>     "fields": [
>         {
>             "name": "field",
>             "type": {
>                 "name": "field",
>                 "type": "fixed",
>                 "size": 6
>             }
>         }
>     ]
> }
> {noformat}
> In Rust, I might represent this schema with the following struct:
> {noformat}
> #[derive(Debug, Serialize, Deserialize)]
> struct TestStructFixedField {
>     field: [u8; 6]
> }
> {noformat}
> I would then expect to be able to use `apache_avro::to_avro_datum()` to convert an instance of `TestStructFixedField` into an `Vec<u8>` using an instance of `Schema` initialized from the schema listed above.
> However, this fails because the `Value` produced by `apache_avro::to_value()` represents `field` as an `Value::Array<Value::Int>` rather than a `Value::Fixed<6, Vec<u8>` which does not pass schema validation.
> I believe that there are two options to fix this:
> 1. Allow Value::Array<Vec<Value::Int>> to pass validation if the array has the expected length, and none of the contents of the array are out-of-range for u8. If we go down this route, the implementation of `to_avro_datum()` will have to take care of converting Value::Int to u8 when converting into bytes.
> 2. Update `apache_avro::to_value()` such that fixed length arrays are converted into `Value::Fixed<N, Vec<u8>>` rather than `Value::Array`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)