You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/01 12:49:26 UTC

[GitHub] [hudi] nochimow commented on issue #3431: [SUPPORT] Failed to upsert for commit time

nochimow commented on issue #3431:
URL: https://github.com/apache/hudi/issues/3431#issuecomment-910251510

Hi,
Thanks for the reply.

I was trying in some ways to change the memory and memory-overhead parameters without success.
Since i am using the AWS Glue to run this, i opened a ticket with AWS support and received this response:

_These 'conf' settings are not available for override. [1] This allows AWS to manage the resources dynamically and provide efficient performance. Below are several argument names used by AWS Glue internally that you should never set:

--conf — Internal to AWS Glue. Do not set!
--debug — Internal to AWS Glue. Do not set!
--mode — Internal to AWS Glue. Do not set!
--JOB_NAME — Internal to AWS Glue. Do not set!

I am writing down difference between different worker types below .

-->For G.1X Worker nodes:

The maximum amount of driver memory you can provide is 10GB.
Each executor is configured with 10 GB memory
Each executor is configured with 8 spark cores
Each worker is configured with 1 executor
Each worker maps to 1 DPU (4 vCPU, 16 GB off memory, 64 GB disk), and provides 1 executor per worker.

For G.2X Worker nodes:

The maximum amount of driver memory you can provide is 20GB.
Each executor is configured with 20 GB memory
Each executor is configured with 16 spark cores
Each worker is configured with 1 executor
Each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker

Each executor has several task slots (or CPU cores) for running tasks in parallel [4].

* numExecutors =
* (DPU - 1) * 2 - 1 if WorkerType is Standard
* (NumberOfWorkers - 1) if WorkerType is G.1X or G.2X
* numSlotsPerExecutor =
* 4 if WorkerType is Standard
* 8 if WorkerType is G.1X
* 16 if WorkerType is G.2X
* numSlots = numSlotsPerExecutor * numExecutors_

Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html

So, in this case, the best option on AWS Glue is to use the G.2X machines, that we are already using, that sets the following parameters by default (and can't be overriden)

--conf spark.dynamicAllocation.enabled=true
--conf spark.shuffle.service.enabled=true
--conf spark.dynamicAllocation.minExecutors=1
--conf spark.dynamicAllocation.maxExecutors=6
--conf spark.executor.memory=20g
--conf spark.executor.cores=16
--conf spark.driver.memory=20g
--conf spark.default.parallelism=112
--conf spark.sql.shuffle.partitions=112 --conf

Like i mentioned on my initial post, we used 14 * G.2X machines and also got this error.
Since these parameters can't be change, there is any tuning that can be done on Hudi configuration side?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org