You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "codope (via GitHub)" <gi...@apache.org> on 2023/02/05 15:35:41 UTC

[GitHub] [hudi] codope commented on a diff in pull request #7856: [HUDI-5704] De-coupling column drop flag and schema validation flag (0.13.0)

codope commented on code in PR #7856:
URL: https://github.com/apache/hudi/pull/7856#discussion_r1096724818


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -455,7 +455,22 @@ object HoodieSparkSqlWriter {
           //       w/ the table's one and allow schemas to diverge. This is required in cases where
           //       partial updates will be performed (for ex, `MERGE INTO` Spark SQL statement) and as such
           //       only incoming dataset's projection has to match the table's schema, and not the whole one
-          if (!shouldValidateSchemasCompatibility || isSchemaCompatible(latestTableSchema, canonicalizedSourceSchema, allowAutoEvolutionColumnDrop)) {
+
+          if (!shouldValidateSchemasCompatibility) {
+            // if no validation is enabled, check for col drop
+            // if col drop is allowed, go ahead. if not, check for projection, so that we do not allow dropping cols
+            if (allowAutoEvolutionColumnDrop || canProject(latestTableSchema, canonicalizedSourceSchema)) {

Review Comment:
   `canProject` will return false if column names differ and ingestion will fail. However, we do allow merge into with different source column names or we even allow same column name but different case. 
   This is going to change the behavior compared to previous release. Is this something we should do? If yes, then what if users comes back asking for a workaround to unblock the pipeline. Then, weneed to tell them to set `hoodie.datasource.write.schema.allow.auto.evolution.column.drop=true` which doesn't sound intuitive. Wdyt?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org