You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/01/04 05:46:20 UTC

[GitHub] [hudi] afeldman1 removed a comment on issue #2399: [SUPPORT] Hudi deletes not being properly commited

afeldman1 removed a comment on issue #2399:
URL: https://github.com/apache/hudi/issues/2399#issuecomment-753769016


   @bvaradar  I believe so, I used the same key fields in both the initial write and the delete write.
   Initial hudi options:
   ```
       val hudiOptions = Map[String,String](
         DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "glue_db_name",
         HoodieWriteConfig.TABLE_NAME → "test_tbl_nm",
         DataSourceWriteOptions.TABLE_TYPE_OPT_KEY -> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
         DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL,
         DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[ComplexKeyGenerator].getName,
         DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "col_c",
         DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"col_b,col_c",
         DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "processed_time",
         DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
         DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → "test_tbl_nm",
         DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "col_b,col_c",
         DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getCanonicalName,
         DataSourceWriteOptions.HIVE_URL_OPT_KEY -> s"jdbc:hive2://${driverHostName}:10000",
         DataSourceWriteOptions.INSERT_DROP_DUPS_OPT_KEY -> "true"
       )
   ```
   
   Deletion options:
   ```
         DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> "glue_db_name",
         HoodieWriteConfig.TABLE_NAME → "test_tbl_nm",
         DataSourceWriteOptions.TABLE_TYPE_OPT_KEY -> DataSourceWriteOptions.COW_TABLE_TYPE_OPT_VAL,
         DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.DELETE_OPERATION_OPT_VAL,
         DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "col_c",
         DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY ->"col_b,col_c",
         DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY → "true",
         DataSourceWriteOptions.HIVE_TABLE_OPT_KEY → "test_tbl_nm",
         DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY → "col_b,col_c",
         DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY → classOf[MultiPartKeysValueExtractor].getCanonicalName,
         DataSourceWriteOptions.HIVE_URL_OPT_KEY -> s"jdbc:hive2://${driverHostName}:10000"
   ```
   
   Additionally, in case this helps, the .inflight commit log file also has content:
   ```
   {
     "partitionToWriteStats" : { },
     "compacted" : false,
     "extraMetadata" : { },
     "operationType" : "DELETE",
     "totalRecordsDeleted" : 0,
     "totalLogRecordsCompacted" : 0,
     "fileIdAndRelativePaths" : { },
     "totalScanTime" : 0,
     "totalCreateTime" : 0,
     "totalUpsertTime" : 0,
     "totalCompactedRecordsUpdated" : 0,
     "totalLogFilesCompacted" : 0,
     "totalLogFilesSize" : 0
   }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org