You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/02/02 10:17:59 UTC

[GitHub] [spark] attilapiros commented on a change in pull request #31133: [SPARK-26836][SQL] Supporting Avro schema evolution for partitioned Hive tables

attilapiros commented on a change in pull request #31133:
URL: https://github.com/apache/spark/pull/31133#discussion_r568483231



##########
File path: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala
##########
@@ -388,6 +394,9 @@ private[hive] object HiveTableUtil {
 private[hive] object DeserializerLock
 
 private[hive] object HadoopTableReader extends HiveInspectors with Logging {
+
+  val avroTableProperties = AvroTableProperties.values().map(_.getPropName()).toSet

Review comment:
       The `AvroTableProperties` enum is used for avoiding the hardcoding of those property names which related to Avro and thus where the table properties has higher priority when a value for that property is chosen.
   
   So this PR does NOT set any of those properties if the user hasn't specified one by using the `SERDEPROPERTIES` on the table or on the partition here we just change which one is wining before passing the values further to the SerDe initialization.
   
   The `avro.schema.literal` on its own is not enough for example if the user used different `avro.schema.url` on the table than on the partition we will have the same issue. This can be true for all the properties it is good if they are coming from the consistent source: either from the table or from the partition. The advantage of using `AvroTableProperties` that we should not investigate all the consistency between these properties but just take them all from the same source and when a new one is introduced in the future which matters for the serde we will be still safe and this part does not generate a new bug silently (as we cannot have test for not existing future serde properties).
   
   What do you think?
    




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org