You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/25 08:33:06 UTC

[GitHub] [hudi] ayush71994 opened a new issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

ayush71994 opened a new issue #2992:
URL: https://github.com/apache/hudi/issues/2992


   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://cwiki.apache.org/confluence/display/HUDI/FAQ)?
     - Yes. 
   - Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
     -  Requested 
   
   - If you have triaged this as a bug, then file an [issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
     -   Pretty sure its a bug, need confirmation from the devs
   
   **Describe the problem you faced**
   
   We are using EMR-5.33 with Hudi 0.7.0. 
    Seeing an issue with the behaviour of insert_overwrite. From the [documentation](https://hudi.apache.org/docs/quick-start-guide.html#insert-overwrite), we can use insert_overwrite to overwrite specific partitions.
   But what we are seeing is if the dataframe contains records that are present in the hudi table partition we are trying to overwrite, those records will be missing in the overwritten partition. 
   In case all the records in the incoming dataframe match with the records in the table partition no write takes place. The partition is not overwritten. Incase of duplicate records or bad data it is not deleting the data already present in the partition.
   This behaviour seems different from what is described in the documentation
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1. Create a partitioned Dataframe with Duplicate records in one or more partitions
   2. Use hudi bulk insert with "delete.duplicates=false" to create the table
   3. Use hudi insert_overwrite with ""delete.duplicates=true" with the correct dataframe without duplicate records
   4. The count does not reduce in the impacted partitions
   5. The replace commit shows 0 bytes written
   
   **Expected behavior**
   
   Expected insert_override to delete the older data and replace with the new dataframe without the duplicate records
   
   **Environment Description**
   
   * Hudi version : 0.7.0
   
   * Spark version : 2.4.0
   
   * Hive version : version 1.2.2
   
   * Hadoop version : 2.10
   
   * EMR : 5.33
   
   * Storage (HDFS/S3/GCS..) : S3 and hive sync to Glue
   
   * Running on Docker? (yes/no) : Running on EMR using a fat jar
   
   
   **Additional context**
   
   Hudi config used
   
   ```
   val hudiOptions = Map[String,String](
       HoodieWriteConfig.TABLE_NAME -> tableName,
   DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "region_id,isbn,order_id",
   DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "region_id:SIMPLE,order_day:TIMESTAMP",
   DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "ecs_version",
       DataSourceWriteOptions.TABLE_NAME_OPT_KEY -> tableName,
   DataSourceWriteOptions.KEYGENERATOR_CLASS_OPT_KEY -> classOf[CustomKeyGenerator].getCanonicalName,
       HoodieWriteConfig.INSERT_PARALLELISM -> "5000",
       HoodieWriteConfig.BULKINSERT_PARALLELISM -> "5000",
       HoodieWriteConfig.UPSERT_PARALLELISM -> "5000",
   DataSourceWriteOptions.INSERT_DROP_DUPS_OPT_KEY -> "false",
       "hoodie.parquet.max.file.size" -> DEFAULT_HUDI_FILESIZE,
        //256MB,
       "hoodie.parquet.block.size" -> DEFAULT_HUDI_FILESIZE,
   "hoodie.datasource.write.hive_style_partitioning" -> "true",
       "hoodie.cleaner.commits.retained" -> "2", HoodieCompactionConfig.PARQUET_SMALL_FILE_LIMIT_BYTES ->  DEFAULT_SMALL_FILESIZE,
       /* hive sync settings */  DataSourceWriteOptions.HIVE_SYNC_ENABLED_OPT_KEY -> "true",
   DataSourceWriteOptions.HIVE_PARTITION_FIELDS_OPT_KEY -> "region_id,order_day",    DataSourceWriteOptions.HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY -> classOf[MultiPartKeysValueExtractor].getName,
   DataSourceWriteOptions.HIVE_ASSUME_DATE_PARTITION_OPT_KEY -> "false",
     DataSourceWriteOptions.HIVE_DATABASE_OPT_KEY -> glueDatabase,
       DataSourceWriteOptions.HIVE_TABLE_OPT_KEY -> tableName,
   "hoodie.datasource.hive_sync.support_timestamp"-> "true",
   "hoodie.deltastreamer.keygen.timebased.timestamp.type" -> "SCALAR",
   "hoodie.deltastreamer.keygen.timebased.timestamp.scalar.time.unit" -> "microseconds",
   "hoodie.deltastreamer.keygen.timebased.timezone"->"UTC",
   "hoodie.deltastreamer.keygen.timebased.input.dateformat"->"yyyy-MM-dd HH:mm:ss",
   "hoodie.deltastreamer.keygen.timebased.output.dateformat"->"yyyy-MM-dd HH:mm:ss"
       )
   
   ```
   Replace Commits
   
   From  Hudi Cli
   ```
   20210524212401 │ 0.0 B               │ 0                 │ 0                   │ 0                        │ 0                     │ 0                            │ 0            ║
   ╟────────────────┼─────────────────────┼───────────────────┼─────────────────────┼──────────────────────────┼───────────────────────┼──────────────────────────────┼──────────────╢
   ║ 20210524201042 │ 0.0 B               │ 0                 │ 0                   │ 0                        │ 0                     │ 0                            │ 0
   20210524165247 │ 1.2 GB              │ 2000              │ 0                   │ 1                        │ 2852837               │ 0                            │ 0
   ```
   
   Contents of Replace commit
   ```
   {
     "partitionToWriteStats" : { },
     "compacted" : false,
     "extraMetadata" : {
       "schema" : "{\"type\":\"record\",\"name\":\"slim_table_record\",\"namespace\":\"hoodie.slim_table\",\"fields\":[{\"name\":\"_bdt_region_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record._bdt_region_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":0},\"null\"]},{\"name\":\"isbn\",\"type\":[\"string\",\"null\"]},{\"name\":\"order_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"_bdt_order_day\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"gl_product_group\",\"type\":[\"int\",\"null\"]},{\"name\":\"warehouse_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"legal_entity_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.legal_entity_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":0},\"null\"]},{\"name\":\"client_external_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"supplier_ord
 er_type_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.supplier_order_type_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"inventory_owner_type_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.inventory_owner_type_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"organizational_unit_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.organizational_unit_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"inventory_fiscal_owner_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.inventory_fiscal_owner_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"inventory_owner_group
 _id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.inventory_owner_group_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"sourcing_plan_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.sourcing_plan_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"purchasing_chain_link_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.purchasing_chain_link_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"purchasing_chain_name\",\"type\":[\"string\",\"null\"]},{\"name\":\"purch_chain_link_type_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"purchasing_chain_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.purchasing_chain_id\",\
 "size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"item_authority_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"distributor_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"order_datetime\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"vendor_external_id_type\",\"type\":[\"int\",\"null\"]},{\"name\":\"vendor_external_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"scan_external_id_type\",\"type\":[\"int\",\"null\"]},{\"name\":\"scan_external_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"opa_reference_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"opa_request_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.opa_request_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"incoterm\",\"type\":[\"string\",\"null\"]},{\"name\":\"marketplace_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\"
 ,\"namespace\":\"hoodie.slim_table.slim_table_record.marketplace_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":0},\"null\"]},{\"name\":\"country\",\"type\":[\"string\",\"null\"]},{\"name\":\"is_retail\",\"type\":[\"string\",\"null\"]},{\"name\":\"is_goldfish\",\"type\":[\"string\",\"null\"]},{\"name\":\"is_jersey\",\"type\":[\"string\",\"null\"]},{\"name\":\"is_dropship\",\"type\":[\"string\",\"null\"]},{\"name\":\"allocation_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"allocation_time\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"batch_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"carton_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.carton_qty\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"container_id\",\"type\":[\"int\",\"null\"]},{\"name\":\"currency_code\",\"type\":[\"string\",\"null\"]},{\"n
 ame\":\"current_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.current_qty\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"end_shipping_window\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"executed_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.executed_qty\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"expected_lead_time\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.expected_lead_time\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"hidden\",\"type\":[\"int\",\"null\"]},{\"name\":\"initial_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.initial_qty\",\"size\":16,\
 "logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"landed_cost\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.landed_cost\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"nbd_source\",\"type\":[\"string\",\"null\"]},{\"name\":\"need_by_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"override_cost\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.override_cost\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"override_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.override_qty\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"override_reason\",\"type\":[\"string\",\"null\"]},{\"name\":\"plan_id\",\"type\
 ":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.plan_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"pre_capacity_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.pre_capacity_qty\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"request_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"sorttype\",\"type\":[\"string\",\"null\"]},{\"name\":\"start_shipping_window\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"vendor_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"window_type\",\"type\":[\"string\",\"null\"]},{\"name\":\"external_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"group_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"internal_category\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_tab
 le_record.internal_category\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"internal_status\",\"type\":[\"string\",\"null\"]},{\"name\":\"inventory_mgmt_scope_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"last_internal_status\",\"type\":[\"string\",\"null\"]},{\"name\":\"last_status\",\"type\":[\"string\",\"null\"]},{\"name\":\"process_log\",\"type\":[\"string\",\"null\"]},{\"name\":\"request_creation_time\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"request_last_updated\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"request_minimal\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.request_minimal\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":18},\"null\"]},{\"name\":\"request_type\",\"type\":[\"string\",\"null\"]},{\"name\":\"requester\",\"type\":[\"string\",\"null\"
 ]},{\"name\":\"require_approval\",\"type\":[\"int\",\"null\"]},{\"name\":\"scheduled_execution_time\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"scheduled_order_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"source\",\"type\":[\"string\",\"null\"]},{\"name\":\"special_type\",\"type\":[\"string\",\"null\"]},{\"name\":\"user_to_notify\",\"type\":[\"string\",\"null\"]},{\"name\":\"constrained_order_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"domain\",\"type\":[\"string\",\"null\"]},{\"name\":\"execution_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"plan_execution_status\",\"type\":[\"string\",\"null\"]},{\"name\":\"plan_execution_uuid\",\"type\":[\"string\",\"null\"]},{\"name\":\"computation_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"c
 omputation_uuid\",\"type\":[\"string\",\"null\"]},{\"name\":\"compute_day\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"confirmed_quantity\",\"type\":[\"long\",\"null\"]},{\"name\":\"constrained_order_quantity\",\"type\":[\"long\",\"null\"]},{\"name\":\"context_name\",\"type\":[\"string\",\"null\"]},{\"name\":\"fulfillment_network_sku\",\"type\":[\"string\",\"null\"]},{\"name\":\"plan_status\",\"type\":[\"string\",\"null\"]},{\"name\":\"plan_uuid\",\"type\":[\"string\",\"null\"]},{\"name\":\"purchase_order_type_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"unconstrained_order_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"unconstrained_order_quantity\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.unconstrained_order_quantity\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"vendor_link
 _descriptor\",\"type\":[\"string\",\"null\"]},{\"name\":\"aggregation_type\",\"type\":[\"string\",\"null\"]},{\"name\":\"arr_sup_in_lead_time_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_sup_in_plan_horizon_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_sup_in_post_plan_hor_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_sup_overdue_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_trans_in_lead_time_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_trans_in_plan_horizon_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_trans_in_post_plan_hor_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arr_trans_overdue_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"arrival_smoothing_pad_days\",\"type\":[\"long\",\"null\"]},{\"name\":\"auto_order_andon_cord_flag\",\"type\":[\"string\",\"null\"]},{\"name\":\"buying_period_adjust_in_days\",\"type\":[\"string\",\"null\"]},{\"name\":\"buying_period_in_days\",\"type\":[\"long\",\"null\"]},{\"name\":\"buying_period_policy\",\"ty
 pe\":[\"string\",\"null\"]},{\"name\":\"carton_quantity\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.carton_quantity\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"critical_ratio\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.critical_ratio\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":4},\"null\"]},{\"name\":\"critical_ratio_source\",\"type\":[\"string\",\"null\"]},{\"name\":\"cross_border_cost_constant\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.cross_border_cost_constant\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"cross_border_cost_multiplier\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.cross_border_cost_multiplier\",\"size\":6,\"logica
 lType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"current_inventory_quantity\",\"type\":[\"long\",\"null\"]},{\"name\":\"daily_forecast_mean\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.daily_forecast_mean\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"demand_lead_time_mean\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.demand_lead_time_mean\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"demand_lead_time_stddev\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.demand_lead_time_stddev\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"demand_plan_horizon_mean\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.de
 mand_plan_horizon_mean\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"expected_demand_for_bp\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.expected_demand_for_bp\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":4},\"null\"]},{\"name\":\"fas_factor\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.fas_factor\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":6},\"null\"]},{\"name\":\"feedback_quantity\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.feedback_quantity\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"holding_cost_factor\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.holding_cost_factor\",\"size\":4,\"logicalType\":\"decimal\",\
 "precision\":9,\"scale\":8},\"null\"]},{\"name\":\"holiday_cutoff\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"item_avg_dph\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.item_avg_dph\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"item_binding\",\"type\":[\"long\",\"null\"]},{\"name\":\"item_birth_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"item_birth_date_source\",\"type\":[\"string\",\"null\"]},{\"name\":\"item_parent_asin\",\"type\":[\"string\",\"null\"]},{\"name\":\"item_platform\",\"type\":[\"string\",\"null\"]},{\"name\":\"item_previous_demand\",\"type\":[\"long\",\"null\"]},{\"name\":\"item_product_category_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"item_product_sub_category_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"item_product_type\",\"type\":[\"string\",\"n
 ull\"]},{\"name\":\"item_sortability\",\"type\":[\"string\",\"null\"]},{\"name\":\"item_tier\",\"type\":[\"string\",\"null\"]},{\"name\":\"lead_time_at_cr\",\"type\":[\"long\",\"null\"]},{\"name\":\"lead_time_end_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"lead_time_mean\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.lead_time_mean\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"lead_time_stddev\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.lead_time_stddev\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"manufacture_on_demand_flag\",\"type\":[\"string\",\"null\"]},{\"name\":\"markdown_flag\",\"type\":[\"string\",\"null\"]},{\"name\":\"minimum_order_quantity\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie
 .slim_table.slim_table_record.minimum_order_quantity\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"mrrp_mean_lt_demand_fraction\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.mrrp_mean_lt_demand_fraction\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"mrrp_reduction_factor\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.mrrp_reduction_factor\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"national_minimum_order_qty\",\"type\":[\"long\",\"null\"]},{\"name\":\"need_by_date_source\",\"type\":[\"string\",\"null\"]},{\"name\":\"net_transfer_supply\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.net_transfer_supply\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"n
 ull\"]},{\"name\":\"offer_price\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.offer_price\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"plan_operation_type_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"plan_vendor_code\",\"type\":[\"string\",\"null\"]},{\"name\":\"preferred_vendor_discount\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.preferred_vendor_discount\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":6},\"null\"]},{\"name\":\"reorder_plan_uuid\",\"type\":[\"string\",\"null\"]},{\"name\":\"replenishment_category\",\"type\":[\"string\",\"null\"]},{\"name\":\"replenishment_policy\",\"type\":[\"string\",\"null\"]},{\"name\":\"review_period_days\",\"type\":[\"long\",\"null\"]},{\"name\":\"scheduling_descriptor_override\",\"type\":[\"string\",\"null\"]},{\"name\":\"target_instock_confidence\",\"typ
 e\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.target_instock_confidence\",\"size\":3,\"logicalType\":\"decimal\",\"precision\":5,\"scale\":4},\"null\"]},{\"name\":\"target_inventory_for_lead_time\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.target_inventory_for_lead_time\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"target_inventory_for_plan_hor\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.target_inventory_for_plan_hor\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"target_inventory_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"transfer_suggestion_details\",\"type\":[\"string\",\"null\"]},{\"name\":\"transfer_time\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.transfer_
 time\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":14,\"scale\":2},\"null\"]},{\"name\":\"unfilled_demand_quantity\",\"type\":[\"long\",\"null\"]},{\"name\":\"vendor_selection_strategy\",\"type\":[\"string\",\"null\"]},{\"name\":\"vendor_selection_strategy_src\",\"type\":[\"string\",\"null\"]},{\"name\":\"buying_intent\",\"type\":[\"string\",\"null\"]},{\"name\":\"dw_last_updated\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"dw_creation_date\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"demand_for_bp_stddev\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.demand_for_bp_stddev\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":4},\"null\"]},{\"name\":\"reorder_calculation_source\",\"type\":[\"string\",\"null\"]},{\"name\":\"national_unconstrained_qty\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":
 \"hoodie.slim_table.slim_table_record.national_unconstrained_qty\",\"size\":6,\"logicalType\":\"decimal\",\"precision\":12,\"scale\":2},\"null\"]},{\"name\":\"target_inventory_calc_type\",\"type\":[\"string\",\"null\"]},{\"name\":\"vendor_outage_padding_in_days\",\"type\":[\"long\",\"null\"]},{\"name\":\"routed_fas_factors\",\"type\":[\"string\",\"null\"]},{\"name\":\"point_forecast_strategy\",\"type\":[\"string\",\"null\"]},{\"name\":\"can_order_eaches\",\"type\":[\"string\",\"null\"]},{\"name\":\"ipa_allocation_id\",\"type\":[\"string\",\"null\"]},{\"name\":\"buying_intent_workflow\",\"type\":[\"string\",\"null\"]},{\"name\":\"po_condition\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.po_condition\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":0},\"null\"]},{\"name\":\"order_type\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.order_type\",\"size\":
 16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":0},\"null\"]},{\"name\":\"handler\",\"type\":[\"string\",\"null\"]},{\"name\":\"region_id\",\"type\":[{\"type\":\"fixed\",\"name\":\"fixed\",\"namespace\":\"hoodie.slim_table.slim_table_record.region_id\",\"size\":16,\"logicalType\":\"decimal\",\"precision\":38,\"scale\":0},\"null\"]},{\"name\":\"order_day\",\"type\":[{\"type\":\"long\",\"logicalType\":\"timestamp-micros\"},\"null\"]},{\"name\":\"ecs_snapshot\",\"type\":[\"long\",\"null\"]},{\"name\":\"ecs_version\",\"type\":[\"long\",\"null\"]},{\"name\":\"ecs_bundle_type\",\"type\":[\"string\",\"null\"]}]}"
     },
     "operationType" : "INSERT_OVERWRITE",
     "partitionToReplaceFileIds" : { },
     "fileIdAndRelativePaths" : { },
     "totalRecordsDeleted" : 0,
     "totalLogRecordsCompacted" : 0,
     "totalLogFilesCompacted" : 0,
     "totalCompactedRecordsUpdated" : 0,
     "totalLogFilesSize" : 0,
     "totalScanTime" : 0,
     "totalCreateTime" : 0,
     "totalUpsertTime" : 0,
     "minAndMaxEventTime" : {
       "Optional.empty" : {
         "val" : null,
         "present" : false
       }
     },
     "writePartitionPaths" : [ ]
   }
   ```
   
   **Stacktrace**
   
   ```No Errors```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-991914495


   @ayush71994 : Can you respond with your latest when you get a chance. would like to get to the bottom of this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-930092640






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-997572321


   closing the issue we we could not reproduce. Feel free to re-open if you are still facing the issue. would be happy to assist


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-930123547


   I could not reproduce in latest master. 
   https://gist.github.com/nsivabalan/23caa2f57c41bc9356ed7fa29590c147
   
   Here is my understanding. 
   INSERT_DROP_DUPES will delete records from incoming df with those matching in existing hudi table. when this is used along with INSERT_OVERRIDE operation, first insert_drop_dupes kicks in and so, possible some records from incoming batch will be dropped. and then INSERT_OVERRIDE is performed. and any matching partitions will be overritten. In my gist link, I did not use insert_drop_dupes for INSERT_OVERRIDE, just to show that it works. You need to set combine.before.insert/upsert to true to drop duplicates among incoming batch. 
   
   Here is the output if I use insert_drop_dupes with insert_override 
   
   +------+---------+---+
   |typeId|recordKey|str|
   +------+---------+---+
   |2     |key4     |mno|
   |1     |key1     |def|
   |3     |key5     |pqr|
   +------+---------+---+
   
   As you could see, key2 is not present here, bcoz, it was dropped since it was already in hudi table. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan closed issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan closed issue #2992:
URL: https://github.com/apache/hudi/issues/2992


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578


   @ayush71994 : 
   1. May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configurations.html. Do you refer to https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY ? 
   2. And with your insert overwrite operation, does your new dataframe has duplicates and you wish to dedup before overwriting? 
   3. Can you confirm that hudi table had data in partitions matching data with batch used for insert_overwrite.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-848732572


   thanks @am-cpp . @satishkotha : would appreciate if you can take a look at the issue. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] am-cpp edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
am-cpp edited a comment on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-848497460


   @nsivabalan 
   
   1. Yes the configuration is https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY which is set to **true**.
   2. Yes the incoming records in the dataframe have multiple records for the same primary which we want to pre-combine/drop based on the column set using the https://hudi.apache.org/docs/configurations.html#PRECOMBINE_FIELD_OPT_KEY config .
   3. Yes the partition and the incoming dataframe has matching data.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] vinothchandar commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-867206424


   folks,  whats the next step here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578


   @ayush71994 : 
   1. May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configurations.html. Do you refer to https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY ? 
   2. And with your insert overwrite operation, does your new dataframe has duplicates and you wish to dedup before overwriting? 
   3. Can you confirm that hudi table had data in partitions matching data with batch used for insert_overwrite.
   
   CC @satishkotha 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-848734380


   While satish tries to investigate, one more question to narrow down the root cause. If you don't set https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY, is your records intact? I mean, new batch overwrites all data in matching partitions, but just that you will find duplicate records if any and your read does return only new records. Can you confirm this behavior?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-930092640


   @am-cpp @ayush71994 . sorry, missed from the radar. Are you folks still interested in triaging this? I can assist you on this. Let me know. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] ayush71994 commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
ayush71994 commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-848788466


   @nsivabalan 
   
   - Yes, when we dont set this flag, the incoming batch does have duplicates in them. We are running our own compaction to remove duplicates. 
   - Our reads are returning new records only, i.e deleting everything that was previously present in the partition


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] ayush71994 edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
ayush71994 edited a comment on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-848788466


   @nsivabalan 
   
   - Yes, when we dont set this flag, the incoming batch does have duplicates in them. We are running our own compaction to remove duplicates. 
   - Our reads are returning new records only, i.e everything that was previously present in the partition was deleted and overwritten with incoming batch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] am-cpp commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
am-cpp commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-847745284


   The issue seems to be happening only when the **INSERT_DROP_DUPS_OPT_KEY** flag is set to **true**.  Looks like this config is being used for both:
   
   1. Pre-combining: [link](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L182)
   2. Deleting records already present in the table:[link](https://github.com/apache/hudi/blob/master/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L158)
   
   As far as the behavior of the insert overwrite API is concerned it should always delete the partition and copy the incoming records. Drop duplicates should just pre-combine the input records.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578


   @ayush71994 : May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configurations.html
   And with your insert overwrite operation, does your new dataframe has duplicates and you wish to dedup before overwriting? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] am-cpp commented on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
am-cpp commented on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-848497460


   @nsivabalan 
   
   1. Yes the configuration is https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY
   2. Yes the incoming records in the dataframe have multiple records for the same primary which we want to pre-combine/drop based on the column set using the https://hudi.apache.org/docs/configurations.html#PRECOMBINE_FIELD_OPT_KEY config .
   3. Yea the partition and the incoming dataframe has matching data.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] nsivabalan edited a comment on issue #2992: [SUPPORT] Insert_Override Api not working as expected in Hudi 0.7.0

Posted by GitBox <gi...@apache.org>.
nsivabalan edited a comment on issue #2992:
URL: https://github.com/apache/hudi/issues/2992#issuecomment-847899578


   @ayush71994 : May I know which config you are referring to here "delete.duplicates"? Can you point me to full config from here https://hudi.apache.org/docs/configurations.html. Do you refer to https://hudi.apache.org/docs/configurations.html#INSERT_DROP_DUPS_OPT_KEY ? 
   And with your insert overwrite operation, does your new dataframe has duplicates and you wish to dedup before overwriting? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org