You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "trushev (via GitHub)" <gi...@apache.org> on 2023/02/21 02:58:03 UTC

[GitHub] [hudi] trushev commented on a diff in pull request #7895: [HUDI-5736] Common de-coupling column drop flag and schema validation flag

trushev commented on code in PR #7895:
URL: https://github.com/apache/hudi/pull/7895#discussion_r1112485560


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/HoodieTable.java:
##########
@@ -799,27 +803,38 @@ public TaskContextSupplier getTaskContextSupplier() {
    * GenericRecords with writerSchema. Hence, we need to ensure that this conversion can take place without errors.
    */
   private void validateSchema() throws HoodieUpsertException, HoodieInsertException {
-
-    if (!shouldValidateAvroSchema() || getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
+    boolean allowProjection = config.shouldAllowAutoEvolutionColumnDrop();
+    boolean shouldValidate = shouldValidateAvroSchema();
+    if ((allowProjection && !shouldValidate)
+        || getActiveTimeline().getCommitsTimeline().filterCompletedInstants().empty()) {
       // Check not required
       return;
     }
 
     Schema tableSchema;
     Schema writerSchema;
-    boolean isValid;
+    String errorMessage = null;
     try {
       TableSchemaResolver schemaResolver = new TableSchemaResolver(getMetaClient());
       writerSchema = HoodieAvroUtils.createHoodieWriteSchema(config.getSchema());
-      tableSchema = HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchemaWithoutMetadataFields());
-      isValid = isSchemaCompatible(tableSchema, writerSchema, config.shouldAllowAutoEvolutionColumnDrop());
+      tableSchema = HoodieAvroUtils.createHoodieWriteSchema(schemaResolver.getTableAvroSchema(false));
+      if (!allowProjection && !AvroSchemaUtils.canProject(tableSchema, writerSchema)) {
+        errorMessage = String.format("Column dropping is not allowed. Use %s to disable this check", SCHEMA_ALLOW_AUTO_EVOLUTION_COLUMN_DROP.key());
+      } else if (shouldValidate && !isSchemaCompatible(tableSchema, writerSchema)) {

Review Comment:
   @danny0405 Could you pls take a look again
   - added canProject(precSchema, newSchema, exceptCols) to avoid collision of `hoodie.datasource.write.schema.allow.auto.evolution.column.drop=false` and `hoodie.datasource.write.drop.partition.columns=true`
   - fixed errors in spark by adding `SCHEMA_ALLOW_AUTO_EVOLUTION_COLUMN_DROP.key -> "true"` to spark merge config
   - moved the avro schema check to `AvroSchemaUtils` to create unit tests in `TestAvroSchemaUtils`
   
   > do we still need the validation in the original `HoodieSparkSqlWriter`
   
   I think we do because the writer does not use `HoodieTable.validate()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org