You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by GitBox <gi...@apache.org> on 2022/10/20 09:05:49 UTC

[GitHub] [avro] martin-g opened a new pull request, #1921: AVRO-3646: serde for enum with mixed variants

martin-g opened a new pull request, #1921:
URL: https://github.com/apache/avro/pull/1921

   ## What is the purpose of the change
   
   * Adds unit tests for AVRO-3646
   
   
   ## Verifying this change
   
   This change just adds unit tests
   
   ## Documentation
   
   - Does this pull request introduce a new feature? no
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286526048

   Converted to Draft because as the JIRA ticket explains the produced `Value`s `cannot be represented using an Avro Schema`.
   The serialization should produce a `Value::Union` that wraps each of the currently produced `Value`s.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1312150624

   Actually it doesn't need to `#serde(untagged)`. It could be even `#avro(something)`.
   I miss Java annotations + reflection APIs :-)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] spiegela commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
spiegela commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1313738237

   To make sure I understand the status of change, is the remaining question here how the AvroSchema derive should _”choose”_ which enum format to use when constructing the schema?
   
   If so, I was thinking that the AvroSchema derive _could_ infer the correct behavior based on the variant characteristics:
   * All unit variants, create an enum schema as we do today.
   * 1+ non-unit variants with either named fields, or single field unnamed fields, use the Record-style (as in this PR) as the schema.
   
   That leaves the case of non-unit, multi-field, unnamed fields 😅. For those, we should return a compile error, I think, and let the dev implement `AvroSchemaComponent` themselves — otherwise, _could_ add Record fields named `0`, `1`, …


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] LucasJavaudin commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
LucasJavaudin commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286645291

   
   > Is it ? IMO the values should always be wrapped in `Value::Union`, so that it is possible to validate it with a Schema. The union items should be:
   
   Sorry, I meant that the test are useful and they currently pass but I agree that they should not pass.
   
   >     * Value::String for unit_variant
   > 
   >     * Value::Array for newtype_variant and tuple_variant
   > 
   >     * Value::Record for struct_variant
   
   For unit variant, shouldn't it be `Value::Null`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1312141745

   I wonder whether we can use https://serde.rs/enum-representations.html#untagged to give a hint to apache_avro (de)serialization process to keep the old behavior.
   I am looking into serde's source code to see how impls like apache_avro can know that attribute is in use.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1307861026

   The avro_derive tests fail ...
   I'll investigate soon!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286636100

   > Yes, this is good!
   
   Is it ?
   IMO the values should always be wrapped in `Value::Union`, so that it is possible to validate it with a Schema. The union items should be:
   - Value::String for unit_variant
   - Value::Array for newtype_variant and tuple_variant
   - Value::Record for struct_variant


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] LucasJavaudin commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
LucasJavaudin commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286625453

   > @LucasJavaudin Could you please review the new unit tests ? Thanks!
   
   Yes, this is good!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1313232123

   Another user asked for the same functionality at https://github.com/serde-rs/serde/issues/2309


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] LucasJavaudin commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
LucasJavaudin commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1310031055

   @martin-g Thank you for your work! This solves the issue I raised in [AVRO-3646](https://issues.apache.org/jira/browse/AVRO-3646).
   
   I pushed two new integration tests in #1949 that try to deserialize and serialize more complex enums. The tests do not pass for multiple reasons. I think that would improve greatly the library if we could (de)serialize arbitrary enums (and even derive a schema for them) but maybe this needs to much work?
   
   Unfortunately, I don't have enough time myself to do more than adding tests and reviewing your changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1311661938

   > The avro_derive tests fail ...
   
   The problem here is that another user (@jklamer, the contributor of the avro_derive crate) expects that 
   ```rust
   #[derive(Debug, Deserialize, Serialize, AvroSchema, Clone, PartialEq, Eq)]
   enum MyEnum {
       Foo,
       Bar,
       Baz,
   }
   ```
   should map to this Avro schema:
   ```
   {
     "name":"myenum",
     "type":{
       "type":"enum",
        "name":"MyEnum",
        "symbols":["Foo", "Bar", "Baz"]
      },
      "default":"Foo"
   }
   ```
   which is really nice! but unfortunately does not work anymore with the new way of supporting complex Rust enums (ones with non-null tuple/newtype/struct variants).
   
   The only way to still support the above I see is to introduce a second `Serializer` impl, e.g. UnitEnumSerializer and let the user decide which one to use. But the problem is that a `Writer` could use either the complex Serializer or the UnitEnumSerializer. There won't be a way to use one serializer for a given RecordField and another serializer for another field.
   
   Also CC @spiegela 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] LucasJavaudin commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
LucasJavaudin commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286635404

   Can I push to this Draft a test that do not pass but that I think should pass?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286651463

   There might be more than one unit variants in the enum. I think it should be `Value::String` with the name of the variant, to be able to deserialize.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1307819365

   @LucasJavaudin I think the PR is ready for final review!
   I'd be thankful if you can provide more use cases / unit tests!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] LucasJavaudin commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
LucasJavaudin commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286681275

   Consider the following enum:
   ```
   enum MyEnum {
       Val1,
       Val2,
       Val3(u64),
   }
   ```
   What should be its schema?
   
   I can think of two options.
   
   ## A record with two fields
   
   - Field "type" which is an enum with symbols "Val1", "Val2", "Val3".
   - Field "value" which is an union.
   
   The union cannot be `["null", "null", "long"]` (or `["string", "string", "long"]`) because we cannot have the same type twice. Correct?
   
   Can it be just `["null", "long"]` or `[enum(["Val1"]), enum(["Val2"]), "long"]`? Neither seems intuitive to me.
   
   ## An union
   
   With the enum `[enum(["Val1"]), enum(["Val2"]), "long"]`, we can know both the variant and the data (if any) but this requires to match the `Val3(u64)` variant to "long", without using the variant name. (Can we match variants by their order?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1313925855

   The actual problem in ser.rs and de.rs.
   This PR changes the way apache_avro serializes objects/instances into `apache_avro::types::Value`.
   For now I see no way to make the serde logic more dynamic, i.e. depending on the enum variants' types or on container/field's attribute to decide what kind of `Value` to construct.
   
   The new serde logic is powerful enough to handle complex kinds of Rust enums (by using Schema::Record([Schema::Enum, Schema::Union])). But it does not support the old representation (Schema::Enum) because all variants are mapped to Value::Null and there could be at most one `null` in an Avro Union.
   
   The logic what schema to derive looks like a simpler task (although I haven't tried it yet, so I might be wrong).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] markfarnan commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by "markfarnan (via GitHub)" <gi...@apache.org>.
markfarnan commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1466859552

   Seems like AVRO-3695 should be addressed in this issue.  (see https://github.com/apache/avro/pull/2035)
   
   However I just tested it against Master + AVRO-3631 and it does not address the problem. 
   
   This is my last blocking issue for using Rust for our industry protocol, so hoping it can be looked at.   @martin-g  ?
   
   I've created what is probably a 'Hack' fix that addresses the issue for me.  But honestly I'm not sure what i'm doing here, so it likley breaks other stuff.   (So I havn't made a PR, just added here for your info)
   
   ser.rs
   `    fn serialize_newtype_variant<T: ?Sized>(
           self,
           _enum_name: &'static str,
           index: u32,
           _variant: &'static str,
           value: &T,
       ) -> Result<Self::Ok, Self::Error>
       where
           T: Serialize,
       {
           let serialized = value.serialize(self)?;
           let _idx = match serialized {
               // assumes that the Null variant is always the first one, as in Avro's Union
               Value::Null => 0,
               _ => index,
           };
           // Ok(Value::Record(vec![
           //     ("type".to_owned(), Value::Enum(index, variant.to_owned())),
           //     ("value".to_owned(), Value::Union(idx, Box::new(serialized))),
           // ]))
   
           Ok(Value::Union(index, Box::new(serialized)))
       }`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1285189768

   @LucasJavaudin Could you please review the new unit tests ? Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] martin-g commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
martin-g commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1286637017

   > Can I push to this Draft a test that do not pass but that I think should pass?
   
   Only Avro project committers can push to this repository. You will have to open a PR against the branch for this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] jklamer commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
jklamer commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1312075157

   Yeah, this conversation is split over a couple different PR's, threads, and e-mails. But I'll respond here. With `avro_derive` I was very intentional about only tackling cases that I could make work automatically with serde and the normal serializer for the best new user experience. Modeling a Rust enum as some Avro entity opened too many cans of worms to handle automatically, having something opt in would be the only way to go but would also make it harder to mix Rust enums and Avro enums clearly (as mentioned above), to say nothing of the ambiguity of the deserialization process. 
   
   My bias would be to create a Macro that creates an intermediate form that is unambiguously convertible between Rust Enum and an Avro compatible entity and do the conversion before serialization and after deserialization  (don't try to make serde process work outside the avro spec) (and could point to macro in the AvroSchema derive error), but I personally haven't been able to think of a solution that works for every case. Rust's type system is just most expansive that Avro's, so the ambiguity is inevitable at some level.
   
   That being said, I LOVE Rust enums, with a passion, and when I realized I couldn't automatically serde them... [I started something that could](https://github.com/jklamer/gluino). <-- me shaving the yak.
   
   TLDR: I think it's impossible to do automatically without creating ambiguity that I would like to shield most users from. I think doing it as an opt in with clearly defined restrictions (and good error messages) would be a good to go for advanced users. 
   
   CC @spiegela
   CC @LucasJavaudin 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] jklamer commented on pull request #1921: AVRO-3646: serde for enum with mixed variants

Posted by GitBox <gi...@apache.org>.
jklamer commented on PR #1921:
URL: https://github.com/apache/avro/pull/1921#issuecomment-1312154963

   @martin-g hahaha agreed. And now with Java Sealed classes, a pretty clear Rust Enum 1-1 could be made too!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org