You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/07/15 16:23:49 UTC

[GitHub] [beam] clairemcginty commented on pull request #14410: [BEAM-2303] Support SpecificData in AvroCoder

clairemcginty commented on pull request #14410:
URL: https://github.com/apache/beam/pull/14410#issuecomment-880838488


   Hi @iemejia / @pabloem / @Amar3tto . this PR created some hidden bugs for us upgrading from Beam 2.29.0 to 2.30.0. It changes the default `CharSequence` representation in decoded Avro string fields. When using `ReflectDatum{Reader,Writer}`, `CharSequence`s are backed by default by Strings [[1]](https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java#L229). This switch to `SpecificDatum{Reader,Writer}` means that, unless the Avro field property `java-class` is set to `java.lang.String` for all String fields, the `CharSequence`s are backed by default now by `org.apache.avro.util.Utf8`s [[2]](https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L408). a lot of our users were relying on the default representation being Strings and are now seeing runtime errors in pipelines. Finally, `Utf8`s aren't serializable so there's no default `Coder` im
 plementation for them, so users would have to convert them to Java strings anyway if they wanted to do a GBK operation on an Avro field, for example. I created a quick Gist to demonstrate the problem: [[3]](https://gist.github.com/clairemcginty/97ee6b33c0b5633d5d42d29b1d057d85). 
   
   Is this something I could bring to the dev@ or user@ mailing list? Let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org