You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "nsivabalan (via GitHub)" <gi...@apache.org> on 2023/02/05 05:34:26 UTC

[GitHub] [hudi] nsivabalan opened a new pull request, #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

nsivabalan opened a new pull request, #7857:
URL: https://github.com/apache/hudi/pull/7857

   ### Change Logs
   
   De-coupling column drop flag and schema validation flag. Looks like we had tight coupling before. 
   for eg, 
   if table schema is col1, col2, col3
   and new incoming schema is col1, col2
   col drop config is set to false (which means col drop should not be supported), and schema validation is set to false, commit will succeed. Expectation is, commit should fail for this new batch. 
   
   Fixing the use-case in this patch, where we are de-coupling them. col drop flag will be honored irrespective of whether schema validation is enabled or not. 
   
   ### Impact
   
   Column drop flag will be honored in all cases. 
   
   ### Risk level (write none, low medium or high below)
   
   low.
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1416966322

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d75f5edf0672a8adf5ac07cb56dc710169ad3b44 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "trushev (via GitHub)" <gi...@apache.org>.
trushev commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1098203988


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */

Review Comment:
   @danny0405,  yes we need similar changes
   I tried the following flink job on the latest master branch
   ```sql
   -- Step 1.
   create table tbl(
       `uuid` bigint,
       `name` string,
       `age` int null, -- column to be dropped
       `part` int
   ) partitioned by (`part`) with (
       'connector'='hudi',
       'path'='/tmp/tbl',
       'hoodie.avro.schema.validate'='false',
       'hoodie.datasource.write.schema.allow.auto.evolution.column.drop'='false'
   );
   insert into tbl values (1, 'Danny', 23, 10);
   drop table tbl;
   
   -- Step 2.
   create table tbl(
       `uuid` bigint,
       `name` string,
       `part` int
   ) partitioned by (`part`) with (
       'connector'='hudi',
       'path'='/tmp/tbl',
       'hoodie.avro.schema.validate'='false',
       'hoodie.datasource.write.schema.allow.auto.evolution.column.drop'='false'
   );
   insert into tbl values (2, 'Stephen', 10); -- failure expected
   select * from tbl;
   ```
   Expected behavior:
   ```
   Exception -- column dropping is not allowed
   ```
   Actual behavior:
   ```
   +----+-------+----------+-------+
   | op |  uuid |     name |  part |
   +----+-------+----------+-------+
   | +I |     1 |    Danny |    10 |
   | +I |     2 |  Stephen |    10 |
   +----+-------+----------+-------+
   
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1418394974

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "triggerType" : "PUSH"
     }, {
       "hash" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14937",
       "triggerID" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14946",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14946",
       "triggerID" : "5623ce8349f2d49974543460378410840e194e56",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d75f5edf0672a8adf5ac07cb56dc710169ad3b44 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * 5623ce8349f2d49974543460378410840e194e56 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14946) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1417092314

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "triggerType" : "PUSH"
     }, {
       "hash" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d75f5edf0672a8adf5ac07cb56dc710169ad3b44 UNKNOWN
   * 177be7b54cbe98f305cd895f1c277438d922f5c7 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "trushev (via GitHub)" <gi...@apache.org>.
trushev commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1098212349


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */

Review Comment:
   Yes I took a brief  look at validation mechanism. It looks like I can handle this fix



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1097257362


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */

Review Comment:
   @trushev , do you think we need similar changes in Flink side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan merged PR #7857:
URL: https://github.com/apache/hudi/pull/7857


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1418384665

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "triggerType" : "PUSH"
     }, {
       "hash" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14937",
       "triggerID" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5623ce8349f2d49974543460378410840e194e56",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d75f5edf0672a8adf5ac07cb56dc710169ad3b44 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   * 5623ce8349f2d49974543460378410840e194e56 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1098205040


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */

Review Comment:
   Thanks for the check, is there any possibility that you can fire a following fix then?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1097257648


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */

Review Comment:
   @trushev , do you think we need similar changes in Flink side?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] xushiyan commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "xushiyan (via GitHub)" <gi...@apache.org>.
xushiyan commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1096631582


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */
+  public static boolean checkProjection(Schema prevSchema, Schema newSchema) {

Review Comment:
   better name as canProject() to imply boolean return



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -455,7 +455,22 @@ object HoodieSparkSqlWriter {
           //       w/ the table's one and allow schemas to diverge. This is required in cases where
           //       partial updates will be performed (for ex, `MERGE INTO` Spark SQL statement) and as such
           //       only incoming dataset's projection has to match the table's schema, and not the whole one
-          if (!shouldValidateSchemasCompatibility || isSchemaCompatible(latestTableSchema, canonicalizedSourceSchema, allowAutoEvolutionColumnDrop)) {
+
+          if (!shouldValidateSchemasCompatibility) {
+            // if no validation is enabled, check for col drop
+            // if col drop is allowed, go ahead. if not, check for projection, so that we do not allow dropping cols
+            if (allowAutoEvolutionColumnDrop || checkProjection(latestTableSchema, canonicalizedSourceSchema)) {
+              canonicalizedSourceSchema

Review Comment:
   so just to confirm the logic: if users explicitly allow column drop or if the writer schema can be projected to the table schema (no column drop), we honor the writer schema



##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/hudi/TestAvroSchemaResolutionSupport.scala:
##########
@@ -206,12 +208,18 @@ class TestAvroSchemaResolutionSupport extends HoodieClientTestBase with ScalaAss
   }
 
   @ParameterizedTest
-  @ValueSource(booleans = Array(true, false))
-  def testDeleteColumn(isCow: Boolean): Unit = {
+  @CsvSource(value = Array(
+    "COPY_ON_WRITE,true",
+    "COPY_ON_WRITE,false",
+    "MERGE_ON_READ,true",
+    "MERGE_ON_READ,false"
+  ))
+  def testDeleteColumn(tableType: String, schemaValidationEnable : Boolean): Unit = {

Review Comment:
   /nit schemaValidationEnabled



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1418269077

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] trushev commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "trushev (via GitHub)" <gi...@apache.org>.
trushev commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1098203988


##########
hudi-common/src/main/java/org/apache/hudi/avro/AvroSchemaUtils.java:
##########
@@ -72,6 +69,18 @@ public static boolean isSchemaCompatible(Schema prevSchema, Schema newSchema, bo
     return result.getType() == AvroSchemaCompatibility.SchemaCompatibilityType.COMPATIBLE;
   }
 
+  /**
+   * Check that each field in the prevSchema can be populated in the newSchema
+   * @param prevSchema prev schema.
+   * @param newSchema new schema
+   * @return true if prev schema is a projection of new schema.
+   */

Review Comment:
   @danny0405,  yes we need similar changes
   I tried the following flink job
   ```sql
   -- Step 1.
   create table tbl(
       `uuid` bigint,
       `name` string,
       `age` int null, -- column to be dropped
       `part` int
   ) partitioned by (`part`) with (
       'connector'='hudi',
       'path'='/tmp/tbl',
       'hoodie.avro.schema.validate'='false',
       'hoodie.datasource.write.schema.allow.auto.evolution.column.drop'='false'
   );
   insert into tbl values (1, 'Danny', 23, 10);
   drop table tbl;
   
   -- Step 2.
   create table tbl(
       `uuid` bigint,
       `name` string,
       `part` int
   ) partitioned by (`part`) with (
       'connector'='hudi',
       'path'='/tmp/tbl',
       'hoodie.avro.schema.validate'='false',
       'hoodie.datasource.write.schema.allow.auto.evolution.column.drop'='false'
   );
   insert into tbl values (2, 'Stephen', 10); -- failure expected
   select * from tbl;
   ```
   Expected behavior:
   ```
   Exception -- column dropping is not allowed
   ```
   Actual behavior:
   ```
   +----+-------+----------+-------+
   | op |  uuid |     name |  part |
   +----+-------+----------+-------+
   | +I |     1 |    Danny |    10 |
   | +I |     2 |  Stephen |    10 |
   +----+-------+----------+-------+
   
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1418390182

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "triggerType" : "PUSH"
     }, {
       "hash" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14937",
       "triggerID" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "",
       "status" : "CANCELED",
       "url" : "TBD",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5623ce8349f2d49974543460378410840e194e56",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * d75f5edf0672a8adf5ac07cb56dc710169ad3b44 UNKNOWN
   * 5623ce8349f2d49974543460378410840e194e56 UNKNOWN
   *  Unknown: [CANCELED](TBD) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #7857:
URL: https://github.com/apache/hudi/pull/7857#issuecomment-1418557536

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d75f5edf0672a8adf5ac07cb56dc710169ad3b44",
       "triggerType" : "PUSH"
     }, {
       "hash" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14937",
       "triggerID" : "177be7b54cbe98f305cd895f1c277438d922f5c7",
       "triggerType" : "PUSH"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14946",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "5623ce8349f2d49974543460378410840e194e56",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14946",
       "triggerID" : "5623ce8349f2d49974543460378410840e194e56",
       "triggerType" : "PUSH"
     }, {
       "hash" : "",
       "status" : "DELETED",
       "url" : "TBD",
       "triggerID" : "1418269077",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * d75f5edf0672a8adf5ac07cb56dc710169ad3b44 UNKNOWN
   * 5623ce8349f2d49974543460378410840e194e56 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=14946) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #7857: [HUDI-5704] De-coupling column drop flag and schema validation flag

Posted by "nsivabalan (via GitHub)" <gi...@apache.org>.
nsivabalan commented on code in PR #7857:
URL: https://github.com/apache/hudi/pull/7857#discussion_r1096633834


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -455,7 +455,22 @@ object HoodieSparkSqlWriter {
           //       w/ the table's one and allow schemas to diverge. This is required in cases where
           //       partial updates will be performed (for ex, `MERGE INTO` Spark SQL statement) and as such
           //       only incoming dataset's projection has to match the table's schema, and not the whole one
-          if (!shouldValidateSchemasCompatibility || isSchemaCompatible(latestTableSchema, canonicalizedSourceSchema, allowAutoEvolutionColumnDrop)) {
+
+          if (!shouldValidateSchemasCompatibility) {
+            // if no validation is enabled, check for col drop
+            // if col drop is allowed, go ahead. if not, check for projection, so that we do not allow dropping cols
+            if (allowAutoEvolutionColumnDrop || checkProjection(latestTableSchema, canonicalizedSourceSchema)) {
+              canonicalizedSourceSchema

Review Comment:
   yes, just that this is only incase when schema validation is not enabled. If enabled, we follow previous logic. you can find it below. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org