You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/11/03 14:16:01 UTC
[GitHub] [hudi] RajasekarSribalan commented on issue #2214: [SUPPORT] Hudi Upsert but with duplicates record for same key
RajasekarSribalan commented on issue #2214:
URL: https://github.com/apache/hudi/issues/2214#issuecomment-720304578
Thanks @bvaradar for response. Please find the data from a table. I am querying id and getting many duplicates. I am querying Hudi table from Spark.
**Hudi Upsert config:**
upsertDf.write
.format("hudi")
.option(OPERATION_OPT_KEY, "upsert")
.option(PRECOMBINE_FIELD_OPT_KEY, "hudi_ingestion_at")
.option(RECORDKEY_FIELD_OPT_KEY, hudi_key)
.option(PARTITIONPATH_FIELD_OPT_KEY, "")
.option(KEYGENERATOR_CLASS_OPT_KEY, classOf[NonpartitionedKeyGenerator].getName)
.option(TABLE_NAME, tablename)
.option(TABLE_TYPE_OPT_KEY, "COPY_ON_WRITE")
.option(HoodieCompactionConfig.CLEANER_COMMITS_RETAINED_PROP, "2")
.option(HoodieCompactionConfig.MIN_COMMITS_TO_KEEP_PROP, "3")
.option(HoodieCompactionConfig.MAX_COMMITS_TO_KEEP_PROP, "4")
.option(HIVE_SYNC_ENABLED_OPT_KEY, "true")
.option(HIVE_URL_OPT_KEY, "jdbc:hive2://XXXXXXXX")
.option(HIVE_DATABASE_OPT_KEY, hudi_db)
.option(HIVE_TABLE_OPT_KEY, tablename)
.option(HIVE_PARTITION_EXTRACTOR_CLASS_OPT_KEY, classOf[NonPartitionedExtractor].getName)
.option(HoodieStorageConfig.PARQUET_COMPRESSION_CODEC, "snappy")
.option("hoodie.upsert.shuffle.parallelism", "100")
.mode(Append)
.save("/user/XXXXXXX/hudi/" + path + "/" + tablename)
**Sample data with duplicates**
+-------------------+-----------------------+------------------+----------------------+---------------------------------------------------------------------------+--------------+-------------------+--------+----------+-------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----------+-----------+-----------+---------------+------------+---------------+-------------------+-------------------+-----------+-----------+-----------+-----------+-----------+------------+------------+
|_hoodie_commit_time|_hoodie_commit_seqno |_hoodie_record_key|_hoodie_partition_path|_hoodie_file_name |rds_shard_name|hudi_ingestion_at |id |account_id|flexifield_id|text_01|text_02|text_03|text_04|text_05|text_06|text_07|text_08|text_09|text_10|slt_text_11|slt_text_12|int_text_13|decimal_text_14|date_text_15|boolean_text_16|created_at |updated_at |mlt_text_17|mlt_text_18|mlt_text_19|mlt_text_20|mlt_text_21|lock_version|eslt_text_22|
+-------------------+-----------------------+------------------+----------------------+---------------------------------------------------------------------------+--------------+-------------------+--------+----------+-------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----------+-----------+-----------+---------------+------------+---------------+-------------------+-------------------+-----------+-----------+-----------+-----------+-----------+------------+------------+
|20201030005747 |20201030005747_8_219096|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219097|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219098|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219099|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219100|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219101|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219102|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219103|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219104|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219105|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219106|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
|20201030005747 |20201030005747_8_219107|37599142 | |613096d4-72b2-4c0e-b5af-364c3e2305dd-0_8-1145-226826_20201030005747.parquet|XXXXXX_shard2|2020-10-30 00:14:24|37599142|108121 |1160018262 |null |null |null |null |null |null |null |null |null |null |--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|2020-09-22 05:58:05|2020-10-30 00:14:24|--- {}
|--- {}
|--- {}
|--- {}
|--- {}
|38 |--- {}
|
+-------------------+-----------------------+------------------+----------------------+---------------------------------------------------------------------------+--------------+-------------------+--------+----------+-------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----------+-----------+-----------+---------------+------------+---------------+-------------------+-------------------+-----------+-----------+-----------+-----------+-----------+------------+------------+
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org