You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/09/01 12:49:26 UTC

[GitHub] [hudi] nochimow commented on issue #3431: [SUPPORT] Failed to upsert for commit time

nochimow commented on issue #3431:
URL: https://github.com/apache/hudi/issues/3431#issuecomment-910251510


   Hi,
   Thanks for the reply.
   
   I was trying in some ways to change the memory and memory-overhead parameters without success.
   Since i am using the AWS Glue to run this, i opened a ticket with AWS support and received this response:
   
   
   _These 'conf' settings are not available for override. [1] This allows AWS to manage the resources dynamically and provide efficient performance. Below are  several argument names used by AWS Glue internally that you should never set:
   
   --conf — Internal to AWS Glue. Do not set!
   --debug — Internal to AWS Glue. Do not set!
   --mode — Internal to AWS Glue. Do not set!
   --JOB_NAME — Internal to AWS Glue. Do not set!
   
   I am writing down difference between different worker types below .  
   
   -->For G.1X Worker nodes:
   
   The maximum amount of driver memory you can provide is 10GB.
   Each executor is configured with 10 GB memory
   Each executor is configured with 8 spark cores
   Each worker is configured with 1 executor
   Each worker maps to 1 DPU (4 vCPU, 16 GB off memory, 64 GB disk), and provides 1 executor per worker.
   
   For G.2X Worker nodes:
   
   The maximum amount of driver memory you can provide is 20GB.
   Each executor is configured with 20 GB memory
   Each executor is configured with 16 spark cores
   Each worker is configured with 1 executor
   Each worker maps to 2 DPU (8 vCPU, 32 GB of memory, 128 GB disk), and provides 1 executor per worker
   
   Each executor has several task slots (or CPU cores) for running tasks in parallel [4].
   
     * numExecutors =
           * (DPU - 1) * 2 - 1 if WorkerType is Standard
           * (NumberOfWorkers - 1) if WorkerType is G.1X or G.2X

       * numSlotsPerExecutor =
           * 4 if WorkerType is Standard
           * 8 if WorkerType is G.1X
           * 16 if WorkerType is G.2X

       * numSlots = numSlotsPerExecutor * numExecutors_
   
   Reference: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html
   
   So, in this case, the best option on AWS Glue is to use the G.2X machines, that we are already using, that sets the following parameters by default (and can't be overriden) 
   
   --conf spark.dynamicAllocation.enabled=true 
   --conf spark.shuffle.service.enabled=true 
   --conf spark.dynamicAllocation.minExecutors=1 
   --conf spark.dynamicAllocation.maxExecutors=6 
   --conf spark.executor.memory=20g 
   --conf spark.executor.cores=16 
   --conf spark.driver.memory=20g
   --conf spark.default.parallelism=112 
   --conf spark.sql.shuffle.partitions=112 --conf 
   
   Like i mentioned on my initial post, we used 14 * G.2X machines and also got this error.
   Since these parameters can't be change, there is any tuning that can be done on Hudi configuration side? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org