You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/04/22 22:23:04 UTC

[GitHub] [incubator-hudi] harshi2506 edited a comment on issue #1552: Time taken for upserting hudi table is increasing with increase in number of partitions

harshi2506 edited a comment on issue #1552:
URL: https://github.com/apache/incubator-hudi/issues/1552#issuecomment-618070975


   hi @lamber-ken, I am already setting parallelism to 200
   hudiOptions += (HoodieWriteConfig.TABLE_NAME -> "table_name",
             DataSourceWriteOptions.STORAGE_TYPE_OPT_KEY -> "COPY_ON_WRITE",
             DataSourceWriteOptions.RECORDKEY_FIELD_OPT_KEY -> "primaryKey",
             DataSourceWriteOptions.PRECOMBINE_FIELD_OPT_KEY -> "dedupeKey",
             DataSourceWriteOptions.PARTITIONPATH_FIELD_OPT_KEY -> "partition_key"(date in format(yyyy/mm/dd)),
             "hoodie.upsert.shuffle.parallelism" ->"200",
             DataSourceWriteOptions.OPERATION_OPT_KEY -> DataSourceWriteOptions.UPSERT_OPERATION_OPT_VAL)
   inputDF.write.format("org.apache.hudi").options(hudiOptions).mode(SaveMode.Append).save(load_path)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org