You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/09 04:42:15 UTC

[GitHub] [hudi] harsh1231 edited a comment on issue #4745: [SUPPORT] Bulk Insert into COW table slow

harsh1231 edited a comment on issue #4745:
URL: https://github.com/apache/hudi/issues/4745#issuecomment-1033341751


   @harishraju-govindaraju  Can you check https://hudi.apache.org/docs/0.5.0/admin_guide/
   `* stats filesizes - File Sizes. Display summary stats on sizes of files
   * stats wa - Write Amplification. Ratio of how many records were upserted to how many records were actually written
   `
   
   Also can you share  stage level spark ui screen shots 
   Performance of upsert operation depends on how much underlying dataset overlaps with incoming dataset 
   Looking at job overall stats -> 792 tasks , check if there are small files created during initial load of data . 
   `hoodie.copyonwrite.record.size.estimate=100` set this during first load of data if you have large number of small files 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org