You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/07/15 16:35:11 UTC

[GitHub] [beam] lukecwik commented on pull request #14410: [BEAM-2303] Support SpecificData in AvroCoder

lukecwik commented on pull request #14410:
URL: https://github.com/apache/beam/pull/14410#issuecomment-880845924


   It would be good start a thread on ***@***.***
   
   On Thu, Jul 15, 2021 at 9:23 AM Claire McGinty ***@***.***>
   wrote:
   
   > Hi @iemejia <https://github.com/iemejia> / @pabloem
   > <https://github.com/pabloem> / @Amar3tto <https://github.com/Amar3tto> .
   > this PR created some hidden bugs for us upgrading from Beam 2.29.0 to
   > 2.30.0. It changes the default CharSequence representation in decoded
   > Avro string fields. When using ReflectDatum{Reader,Writer}, CharSequences
   > are backed by default by Strings [1]
   > <https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/reflect/ReflectDatumReader.java#L229>.
   > This switch to SpecificDatum{Reader,Writer} means that, unless the Avro
   > field property java-class is set to java.lang.String for all String
   > fields, the CharSequences are backed by default now by
   > org.apache.avro.util.Utf8s [2]
   > <https://github.com/apache/avro/blob/release-1.8.2/lang/java/avro/src/main/java/org/apache/avro/generic/GenericDatumReader.java#L408>.
   > a lot of our users were relying on the default representation being Strings
   > and are now seeing runtime errors in pipelines. Finally, Utf8s aren't
   > serializable so there's no default Coder implementation for them, so
   > users would have to convert them to Java strings anyway if they wanted to
   > do a GBK operation on an Avro field, for example. I created a quick Gist to
   > demonstrate the problem: [3]
   > <https://gist.github.com/clairemcginty/97ee6b33c0b5633d5d42d29b1d057d85>.
   >
   > Is this something I could bring to the dev@ or user@ mailing list? Let me
   > know what you think.
   >
   > —
   > You are receiving this because you are subscribed to this thread.
   > Reply to this email directly, view it on GitHub
   > <https://github.com/apache/beam/pull/14410#issuecomment-880838488>, or
   > unsubscribe
   > <https://github.com/notifications/unsubscribe-auth/ACM4V3GYE4CQIN4BOMLPDVLTX4DR3ANCNFSM42I2LRHQ>
   > .
   >
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org