You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/26 16:43:37 UTC

[GitHub] [hudi] bvaradar commented on a change in pull request #2014: [HUDI-1153] Spark DataSource and Streaming Write must fail when operation type is misconfigured

bvaradar commented on a change in pull request #2014:
URL: https://github.com/apache/hudi/pull/2014#discussion_r477439176



##########
File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
##########
@@ -248,15 +249,18 @@ public static HoodieWriteClient createHoodieClient(JavaSparkContext jssc, String
   }
 
   public static JavaRDD<WriteStatus> doWriteOperation(HoodieWriteClient client, JavaRDD<HoodieRecord> hoodieRecords,
-      String instantTime, String operation) throws HoodieException {
-    if (operation.equals(DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())) {
+      String instantTime, WriteOperationType operation) throws HoodieException {
+    if (operation == WriteOperationType.BULK_INSERT) {

Review comment:
       Suggestion : Instead of if-else, by using switch-case, we can take advantage of checkstyle to ensure we are not missing handling of any operation types . 

##########
File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
##########
@@ -248,15 +249,18 @@ public static HoodieWriteClient createHoodieClient(JavaSparkContext jssc, String
   }
 
   public static JavaRDD<WriteStatus> doWriteOperation(HoodieWriteClient client, JavaRDD<HoodieRecord> hoodieRecords,
-      String instantTime, String operation) throws HoodieException {
-    if (operation.equals(DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())) {
+      String instantTime, WriteOperationType operation) throws HoodieException {

Review comment:
       Also, since we have changed this to enum, can you also update DataSourceWriteOptions to reference this enum. It is good that both the enum values and those defined in DataSourceWriteOptions are consistent. So, we can do this without any backwards compatibility issues.
   
   ```
     val OPERATION_OPT_KEY = "hoodie.datasource.write.operation"
     val BULK_INSERT_OPERATION_OPT_VAL = WriteOperationType.BULK_INSERT.name
     val INSERT_OPERATION_OPT_VAL = WriteOperationType.INSERT.name
     val UPSERT_OPERATION_OPT_VAL = WriteOperationType.UPSERT.name
     val DELETE_OPERATION_OPT_VAL = WriteOperationType.DELETE.name
     val BOOTSTRAP_OPERATION_OPT_VAL = WriteOperationType.BOOTSTRAP.name
     val DEFAULT_OPERATION_OPT_VAL = UPSERT_OPERATION_OPT_VAL
   ```

##########
File path: hudi-spark/src/main/java/org/apache/hudi/DataSourceUtils.java
##########
@@ -248,15 +249,18 @@ public static HoodieWriteClient createHoodieClient(JavaSparkContext jssc, String
   }
 
   public static JavaRDD<WriteStatus> doWriteOperation(HoodieWriteClient client, JavaRDD<HoodieRecord> hoodieRecords,
-      String instantTime, String operation) throws HoodieException {
-    if (operation.equals(DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL())) {
+      String instantTime, WriteOperationType operation) throws HoodieException {
+    if (operation == WriteOperationType.BULK_INSERT) {
       Option<BulkInsertPartitioner> userDefinedBulkInsertPartitioner =
           createUserDefinedBulkInsertPartitioner(client.getConfig());
       return client.bulkInsert(hoodieRecords, instantTime, userDefinedBulkInsertPartitioner);
-    } else if (operation.equals(DataSourceWriteOptions.INSERT_OPERATION_OPT_VAL())) {
+    } else if (operation == WriteOperationType.INSERT) {
       return client.insert(hoodieRecords, instantTime);
     } else {
       // default is upsert
+      if (operation != WriteOperationType.UPSERT) {

Review comment:
       @sreeram26 : I think it is better to throw an exception in this case. We saw some cases where uses intended to use bulk insert but had a typo in the operation type. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org