You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@avro.apache.org by "Ten (Jira)" <ji...@apache.org> on 2022/12/18 22:44:00 UTC

[jira] [Commented] (AVRO-3631) Fix serialization of structs containing Fixed fields

    [ https://issues.apache.org/jira/browse/AVRO-3631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17649103#comment-17649103 ] 

Ten commented on AVRO-3631:
---------------------------

It looks like this issue essentially comes down to the fact that we assume it's always possible to convert a Rust struct with a into an Avro value deterministically, but the truth is that structs that get serialized through Serde can be serialized into different kinds of Avro values - this applies for [u8] and co, but also for Union\{null, Something} which probably may be serialized as null if the field is missing, integers that may also be serialized into timestamps...

Overall I think this means that when serializing from Serde framework, we should have knowledge of the schema, and I don't think it should be avoided at all cost. This would give much more flexibility.

 

It looks like to some extent the same applies to deserialization (one may want to turn Avro schema into structs based on the actual Avro type) - although I can't think of a case apart from constructing `types::Value` itself, and I'm not sure what this is even useful for in practice.

> Fix serialization of structs containing Fixed fields
> ----------------------------------------------------
>
>                 Key: AVRO-3631
>                 URL: https://issues.apache.org/jira/browse/AVRO-3631
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: rust
>            Reporter: Rik Heijdens
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Consider the following minimal Avro Schema:
> {noformat}
> {
>     "type": "record",
>     "name": "TestStructFixedField",
>     "fields": [
>         {
>             "name": "field",
>             "type": {
>                 "name": "field",
>                 "type": "fixed",
>                 "size": 6
>             }
>         }
>     ]
> }
> {noformat}
> In Rust, I might represent this schema with the following struct:
> {noformat}
> #[derive(Debug, Serialize, Deserialize)]
> struct TestStructFixedField {
>     field: [u8; 6]
> }
> {noformat}
> I would then expect to be able to use `apache_avro::to_avro_datum()` to convert an instance of `TestStructFixedField` into an `Vec<u8>` using an instance of `Schema` initialized from the schema listed above.
> However, this fails because the `Value` produced by `apache_avro::to_value()` represents `field` as an `Value::Array<Value::Int>` rather than a `Value::Fixed<6, Vec<u8>` which does not pass schema validation.
> I believe that there are two options to fix this:
> 1. Allow Value::Array<Vec<Value::Int>> to pass validation if the array has the expected length, and none of the contents of the array are out-of-range for u8. If we go down this route, the implementation of `to_avro_datum()` will have to take care of converting Value::Int to u8 when converting into bytes.
> 2. Update `apache_avro::to_value()` such that fixed length arrays are converted into `Value::Fixed<N, Vec<u8>>` rather than `Value::Array`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)