You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/08/25 19:02:31 UTC

[GitHub] [hudi] nochimow edited a comment on issue #3431: [SUPPORT] Failed to upsert for commit time

nochimow edited a comment on issue #3431:
URL: https://github.com/apache/hudi/issues/3431#issuecomment-905793757


   Hi, 
   Sorry for the delay on reply, i had lost the execution history, and had to execute this same scenario again.
   Basically we are doing a ingestion of 57 avro files with typical sizes of 70-128MB giving 2,75GB of input data. We create a spark data-frame loading all these files and writing into Hudi. 
   This data is equal 186 million of rows. 
   The schema of the table is composed of 7 string columns, 2 BigInt columns, partitioned by 3 String columns (Day, Month, Year).
   I also checked that this table only have inserts, and this execution was related of a single partition (the current day)
   
   I also attached the glue job driver detailed logs in 2 .csv 
   [start.csv](https://github.com/apache/hudi/files/7049222/start.csv)
   [end.csv](https://github.com/apache/hudi/files/7049220/end.csv)
   
    (The Start and the end) the mid of it as far i saw don't show nothing useful, but if you need it, please let me know.
   
   Also, in this case we are running the glue job with the following infrastructure parameters:
   Worker Type: G2X
   Number of Workers: 7
   Max Concurrency 999
   Job Timeout 2880
   
   But we also did some tests on the past with 14 Workers and we had the same issue.
   
   Thank you in advance.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org