You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "wecharyu (via GitHub)" <gi...@apache.org> on 2023/03/17 14:53:55 UTC

[GitHub] [hudi] wecharyu opened a new pull request, #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

wecharyu opened a new pull request, #8219:
URL: https://github.com/apache/hudi/pull/8219

   ### Change Logs
   Add the insert operation config check for better troubletshooting, especially for the case where upsert to table without preCombineKey.
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   None
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1479069167

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64fab4bc69f94c44aeba7065adcaea5d24cf9973 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771) 
   * a36b85f579d2ef7f1584b932f4bba28e3207405d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1482952270

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "3fed80ecdcfabf29904025fa84e2d08505351189",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15896",
       "triggerID" : "3fed80ecdcfabf29904025fa84e2d08505351189",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 3fed80ecdcfabf29904025fa84e2d08505351189 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15896) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1482561800

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "3fed80ecdcfabf29904025fa84e2d08505351189",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "3fed80ecdcfabf29904025fa84e2d08505351189",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a36b85f579d2ef7f1584b932f4bba28e3207405d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848) 
   * 3fed80ecdcfabf29904025fa84e2d08505351189 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1479639937

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * a36b85f579d2ef7f1584b932f4bba28e3207405d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1148304318


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestOperationConifg.scala:
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{HoodieCatalogTable, SessionCatalog}
+import org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.buildHoodieInsertConfig
+
+import scala.reflect.ClassTag
+
+abstract class TestOperationConfig extends HoodieSparkSqlTestBase {
+  val catalog: SessionCatalog = spark.sessionState.catalog
+

Review Comment:
   This abstract class aims to be inherited to test other operation like `buildHoodieDeleteTableConfig`, `buildHoodieDropPartitionsConfig` etc, a companion object is more reasonable if we do not need test other operation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1479079341

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64fab4bc69f94c44aeba7065adcaea5d24cf9973 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771) 
   * a36b85f579d2ef7f1584b932f4bba28e3207405d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1146470877


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -150,20 +150,7 @@ object HoodieSparkSqlWriter {
       case _ => throw new HoodieException("hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer")
     }
     val tableType = HoodieTableType.valueOf(hoodieConfig.getString(TABLE_TYPE))
-    var operation = WriteOperationType.fromValue(hoodieConfig.getString(OPERATION))
-    // TODO clean up
-    // It does not make sense to allow upsert() operation if INSERT_DROP_DUPS is true
-    // Auto-correct the operation to "insert" if OPERATION is set to "upsert" wrongly
-    // or not set (in which case it will be set as "upsert" by parametersWithWriteDefaults()) .
-    if (hoodieConfig.getBoolean(INSERT_DROP_DUPS) &&

Review Comment:
   Since we enable `INSERT_DROP_DUPS`, all the incoming write data will be de-duplicated in both `UPSERT` and `INSERT` operations, I find it's a bit hard to distinguish these two operations through the written data in this case.
   
   We may get the operationType through the latest `$tbl_path/.hoodie/$timestamp.commit` file, do you think it's apporpriate? Any suggestions will be very appreciated.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1482543299

   Add a new test class for operation config. CC: @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1148280602


##########
hudi-spark-datasource/hudi-spark/src/test/scala/org/apache/spark/sql/hudi/TestOperationConifg.scala:
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hudi
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.catalog.{HoodieCatalogTable, SessionCatalog}
+import org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand.buildHoodieInsertConfig
+
+import scala.reflect.ClassTag
+
+abstract class TestOperationConfig extends HoodieSparkSqlTestBase {
+  val catalog: SessionCatalog = spark.sessionState.catalog
+

Review Comment:
   No need to make the clazz abstract, if you wanna to define some common utilities methods, just declare a companion object class with the same name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1475779349

   @danny0405 @XuQianJin-Stars @yihua: Could you please review this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1144329261


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -141,28 +141,30 @@ trait ProvidesHoodieConfig extends Logging {
     // NOTE: Target operation could be overridden by the user, therefore if it has been provided as an input
     //       we'd prefer that value over auto-deduced operation. Otherwise, we deduce target operation type
     val operationOverride = combinedOpts.get(DataSourceWriteOptions.OPERATION.key)
-    val operation = operationOverride.getOrElse {
-      (enableBulkInsert, isOverwritePartition, isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable) match {
-        case (true, _, _, _, false, _) =>
-          throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert in ${insertMode.value()} mode.")
-        case (true, true, _, _, _, true) =>
-          throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.")
-        case (true, _, _, true, _, _) =>
-          throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." +
-            s" Please disable $INSERT_DROP_DUPS and try again.")
-        // if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table.
-        case (true, false, true, _, _, false) => BULK_INSERT_OPERATION_OPT_VAL
-        // insert overwrite table
-        case (false, false, true, _, _, _) => INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
-        // insert overwrite partition
-        case (_, true, false, _, _, true) => INSERT_OVERWRITE_OPERATION_OPT_VAL
-        // disable dropDuplicate, and provide preCombineKey, use the upsert operation for strict and upsert mode.
-        case (false, false, false, false, false, _) if hasPrecombineColumn => UPSERT_OPERATION_OPT_VAL
-        // if table is pk table and has enableBulkInsert use bulk insert for non-strict mode.
-        case (true, _, _, _, true, _) => BULK_INSERT_OPERATION_OPT_VAL
-        // for the rest case, use the insert operation
-        case _ => INSERT_OPERATION_OPT_VAL
-      }
+    val operation = (operationOverride, enableBulkInsert, isOverwritePartition,
+                     isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable, hasPrecombineColumn) match {
+      case (Some(UPSERT_OPERATION_OPT_VAL), _, _, _, _, _, _, false) =>
+        throw new IllegalArgumentException(s"Table without preCombineKey can not use upsert operation.")
+      case (Some(operation), _, _, _, _, _, _, _)  => operation

Review Comment:
   I guess we need to fetch the overriden operation type val first.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1142958484


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -141,28 +141,30 @@ trait ProvidesHoodieConfig extends Logging {
     // NOTE: Target operation could be overridden by the user, therefore if it has been provided as an input
     //       we'd prefer that value over auto-deduced operation. Otherwise, we deduce target operation type
     val operationOverride = combinedOpts.get(DataSourceWriteOptions.OPERATION.key)
-    val operation = operationOverride.getOrElse {
-      (enableBulkInsert, isOverwritePartition, isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable) match {
-        case (true, _, _, _, false, _) =>
-          throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert in ${insertMode.value()} mode.")
-        case (true, true, _, _, _, true) =>
-          throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.")
-        case (true, _, _, true, _, _) =>
-          throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." +
-            s" Please disable $INSERT_DROP_DUPS and try again.")
-        // if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table.
-        case (true, false, true, _, _, false) => BULK_INSERT_OPERATION_OPT_VAL
-        // insert overwrite table
-        case (false, false, true, _, _, _) => INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
-        // insert overwrite partition
-        case (_, true, false, _, _, true) => INSERT_OVERWRITE_OPERATION_OPT_VAL
-        // disable dropDuplicate, and provide preCombineKey, use the upsert operation for strict and upsert mode.
-        case (false, false, false, false, false, _) if hasPrecombineColumn => UPSERT_OPERATION_OPT_VAL
-        // if table is pk table and has enableBulkInsert use bulk insert for non-strict mode.
-        case (true, _, _, _, true, _) => BULK_INSERT_OPERATION_OPT_VAL
-        // for the rest case, use the insert operation
-        case _ => INSERT_OPERATION_OPT_VAL
-      }
+    val operation = (operationOverride, enableBulkInsert, isOverwritePartition,
+                     isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable, hasPrecombineColumn) match {
+      case (Some(UPSERT_OPERATION_OPT_VAL), _, _, _, _, _, _, false) =>

Review Comment:
   `operationOverride.getOrElse` this logic seems missing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1143032159


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -141,28 +141,30 @@ trait ProvidesHoodieConfig extends Logging {
     // NOTE: Target operation could be overridden by the user, therefore if it has been provided as an input
     //       we'd prefer that value over auto-deduced operation. Otherwise, we deduce target operation type
     val operationOverride = combinedOpts.get(DataSourceWriteOptions.OPERATION.key)
-    val operation = operationOverride.getOrElse {
-      (enableBulkInsert, isOverwritePartition, isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable) match {
-        case (true, _, _, _, false, _) =>
-          throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert in ${insertMode.value()} mode.")
-        case (true, true, _, _, _, true) =>
-          throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.")
-        case (true, _, _, true, _, _) =>
-          throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." +
-            s" Please disable $INSERT_DROP_DUPS and try again.")
-        // if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table.
-        case (true, false, true, _, _, false) => BULK_INSERT_OPERATION_OPT_VAL
-        // insert overwrite table
-        case (false, false, true, _, _, _) => INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
-        // insert overwrite partition
-        case (_, true, false, _, _, true) => INSERT_OVERWRITE_OPERATION_OPT_VAL
-        // disable dropDuplicate, and provide preCombineKey, use the upsert operation for strict and upsert mode.
-        case (false, false, false, false, false, _) if hasPrecombineColumn => UPSERT_OPERATION_OPT_VAL
-        // if table is pk table and has enableBulkInsert use bulk insert for non-strict mode.
-        case (true, _, _, _, true, _) => BULK_INSERT_OPERATION_OPT_VAL
-        // for the rest case, use the insert operation
-        case _ => INSERT_OPERATION_OPT_VAL
-      }
+    val operation = (operationOverride, enableBulkInsert, isOverwritePartition,
+                     isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable, hasPrecombineColumn) match {
+      case (Some(UPSERT_OPERATION_OPT_VAL), _, _, _, _, _, _, false) =>
+        throw new IllegalArgumentException(s"Table without preCombineKey can not use upsert operation.")
+      case (Some(operation), _, _, _, _, _, _, _)  => operation

Review Comment:
   We put the operationOverride get logic after some specific check. @danny0405 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1474027492

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64fab4bc69f94c44aeba7065adcaea5d24cf9973 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1479131595

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * 64fab4bc69f94c44aeba7065adcaea5d24cf9973 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771) 
   * a36b85f579d2ef7f1584b932f4bba28e3207405d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848) Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1479104627

   @hudi-bot run azure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1479148199

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     } ]
   }-->
   ## CI report:
   
   * a36b85f579d2ef7f1584b932f4bba28e3207405d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1145693723


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -150,20 +150,7 @@ object HoodieSparkSqlWriter {
       case _ => throw new HoodieException("hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer")
     }
     val tableType = HoodieTableType.valueOf(hoodieConfig.getString(TABLE_TYPE))
-    var operation = WriteOperationType.fromValue(hoodieConfig.getString(OPERATION))
-    // TODO clean up
-    // It does not make sense to allow upsert() operation if INSERT_DROP_DUPS is true
-    // Auto-correct the operation to "insert" if OPERATION is set to "upsert" wrongly
-    // or not set (in which case it will be set as "upsert" by parametersWithWriteDefaults()) .
-    if (hoodieConfig.getBoolean(INSERT_DROP_DUPS) &&

Review Comment:
   Can we write some UT to check this change?



##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -141,28 +141,30 @@ trait ProvidesHoodieConfig extends Logging {
     // NOTE: Target operation could be overridden by the user, therefore if it has been provided as an input
     //       we'd prefer that value over auto-deduced operation. Otherwise, we deduce target operation type
     val operationOverride = combinedOpts.get(DataSourceWriteOptions.OPERATION.key)
-    val operation = operationOverride.getOrElse {
-      (enableBulkInsert, isOverwritePartition, isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable) match {
-        case (true, _, _, _, false, _) =>
-          throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert in ${insertMode.value()} mode.")
-        case (true, true, _, _, _, true) =>
-          throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.")
-        case (true, _, _, true, _, _) =>
-          throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." +
-            s" Please disable $INSERT_DROP_DUPS and try again.")
-        // if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table.
-        case (true, false, true, _, _, false) => BULK_INSERT_OPERATION_OPT_VAL
-        // insert overwrite table
-        case (false, false, true, _, _, _) => INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
-        // insert overwrite partition
-        case (_, true, false, _, _, true) => INSERT_OVERWRITE_OPERATION_OPT_VAL
-        // disable dropDuplicate, and provide preCombineKey, use the upsert operation for strict and upsert mode.
-        case (false, false, false, false, false, _) if hasPrecombineColumn => UPSERT_OPERATION_OPT_VAL
-        // if table is pk table and has enableBulkInsert use bulk insert for non-strict mode.
-        case (true, _, _, _, true, _) => BULK_INSERT_OPERATION_OPT_VAL
-        // for the rest case, use the insert operation
-        case _ => INSERT_OPERATION_OPT_VAL
-      }
+    val operation = (operationOverride, enableBulkInsert, isOverwritePartition,
+                     isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable, hasPrecombineColumn) match {
+      case (Some(UPSERT_OPERATION_OPT_VAL), _, _, _, _, _, _, false) =>

Review Comment:
   Can we write some UT to check this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1482571419

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "a36b85f579d2ef7f1584b932f4bba28e3207405d",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848",
       "triggerID" : "1479104627",
       "triggerType" : "MANUAL"
     }, {
       "hash" : "3fed80ecdcfabf29904025fa84e2d08505351189",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15896",
       "triggerID" : "3fed80ecdcfabf29904025fa84e2d08505351189",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * a36b85f579d2ef7f1584b932f4bba28e3207405d Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15850) Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15848) 
   * 3fed80ecdcfabf29904025fa84e2d08505351189 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15896) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1474254403

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64fab4bc69f94c44aeba7065adcaea5d24cf9973 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] wecharyu commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "wecharyu (via GitHub)" <gi...@apache.org>.
wecharyu commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1144353953


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/hudi/ProvidesHoodieConfig.scala:
##########
@@ -141,28 +141,30 @@ trait ProvidesHoodieConfig extends Logging {
     // NOTE: Target operation could be overridden by the user, therefore if it has been provided as an input
     //       we'd prefer that value over auto-deduced operation. Otherwise, we deduce target operation type
     val operationOverride = combinedOpts.get(DataSourceWriteOptions.OPERATION.key)
-    val operation = operationOverride.getOrElse {
-      (enableBulkInsert, isOverwritePartition, isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable) match {
-        case (true, _, _, _, false, _) =>
-          throw new IllegalArgumentException(s"Table with primaryKey can not use bulk insert in ${insertMode.value()} mode.")
-        case (true, true, _, _, _, true) =>
-          throw new IllegalArgumentException(s"Insert Overwrite Partition can not use bulk insert.")
-        case (true, _, _, true, _, _) =>
-          throw new IllegalArgumentException(s"Bulk insert cannot support drop duplication." +
-            s" Please disable $INSERT_DROP_DUPS and try again.")
-        // if enableBulkInsert is true, use bulk insert for the insert overwrite non-partitioned table.
-        case (true, false, true, _, _, false) => BULK_INSERT_OPERATION_OPT_VAL
-        // insert overwrite table
-        case (false, false, true, _, _, _) => INSERT_OVERWRITE_TABLE_OPERATION_OPT_VAL
-        // insert overwrite partition
-        case (_, true, false, _, _, true) => INSERT_OVERWRITE_OPERATION_OPT_VAL
-        // disable dropDuplicate, and provide preCombineKey, use the upsert operation for strict and upsert mode.
-        case (false, false, false, false, false, _) if hasPrecombineColumn => UPSERT_OPERATION_OPT_VAL
-        // if table is pk table and has enableBulkInsert use bulk insert for non-strict mode.
-        case (true, _, _, _, true, _) => BULK_INSERT_OPERATION_OPT_VAL
-        // for the rest case, use the insert operation
-        case _ => INSERT_OPERATION_OPT_VAL
-      }
+    val operation = (operationOverride, enableBulkInsert, isOverwritePartition,
+                     isOverwriteTable, dropDuplicate, isNonStrictMode, isPartitionedTable, hasPrecombineColumn) match {
+      case (Some(UPSERT_OPERATION_OPT_VAL), _, _, _, _, _, _, false) =>
+        throw new IllegalArgumentException(s"Table without preCombineKey can not use upsert operation.")
+      case (Some(operation), _, _, _, _, _, _, _)  => operation

Review Comment:
   The new code logic looks like:
   1. try match some specific cases for checking, and throw exception if config is wrong.
   2. if the checking in step 1 pass, and `operationOverride` is defined, we just use it,
   3. if `operationOverride` is not defined, we try to deduce the operation.
   
   Actually we are fetching the overriden operation type val at the begining, please correct me if I'm wrong. :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on code in PR #8219:
URL: https://github.com/apache/hudi/pull/8219#discussion_r1147049916


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala:
##########
@@ -150,20 +150,7 @@ object HoodieSparkSqlWriter {
       case _ => throw new HoodieException("hoodie only support org.apache.spark.serializer.KryoSerializer as spark.serializer")
     }
     val tableType = HoodieTableType.valueOf(hoodieConfig.getString(TABLE_TYPE))
-    var operation = WriteOperationType.fromValue(hoodieConfig.getString(OPERATION))
-    // TODO clean up
-    // It does not make sense to allow upsert() operation if INSERT_DROP_DUPS is true
-    // Auto-correct the operation to "insert" if OPERATION is set to "upsert" wrongly
-    // or not set (in which case it will be set as "upsert" by parametersWithWriteDefaults()) .
-    if (hoodieConfig.getBoolean(INSERT_DROP_DUPS) &&

Review Comment:
   Just write a UT for the sql writer, validate the config inference.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8219: [HUDI-5949] Check the write operation configured by user for better troubleshooting

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8219:
URL: https://github.com/apache/hudi/pull/8219#issuecomment-1474037956

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771",
       "triggerID" : "64fab4bc69f94c44aeba7065adcaea5d24cf9973",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 64fab4bc69f94c44aeba7065adcaea5d24cf9973 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=15771) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org