You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/11 17:31:28 UTC

[GitHub] [spark] thejdeep commented on pull request #36506: [SPARK-25050][SQL] Avro: writing complex unions

thejdeep commented on PR #36506:
URL: https://github.com/apache/spark/pull/36506#issuecomment-1212279813

   @steven-aerts Thanks for working on this feature.
   
   +1 to this PR. The lack of complex union type write support causes us problems too. Right now, since the standard Dataframe/Dataset APIs do not support writing out unions with multiple subtypes, we have been deferring to changing the underlying schema which maybe cumbersome in some cases or having to use the [saveAsNewAPIHadoopFile](https://spark.apache.org/docs/3.0.0/api/scala/org/apache/spark/rdd/PairRDDFunctions.html#saveAsNewAPIHadoopFile(path:String,keyClass:Class%5B_%5D,valueClass:Class%5B_%5D,outputFormatClass:Class%5B_%3C:org.apache.hadoop.mapreduce.OutputFormat%5B_,_%5D%5D,conf:org.apache.hadoop.conf.Configuration):Unit) RDD API which skips the Catalyst path. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org