You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/09 04:42:15 UTC
[GitHub] [hudi] harsh1231 edited a comment on issue #4745: [SUPPORT] Bulk Insert into COW table slow
harsh1231 edited a comment on issue #4745:
URL: https://github.com/apache/hudi/issues/4745#issuecomment-1033341751
@harishraju-govindaraju Can you check https://hudi.apache.org/docs/0.5.0/admin_guide/
`* stats filesizes - File Sizes. Display summary stats on sizes of files
* stats wa - Write Amplification. Ratio of how many records were upserted to how many records were actually written
`
Also can you share stage level spark ui screen shots
Performance of upsert operation depends on how much underlying dataset overlaps with incoming dataset
Looking at job overall stats -> 792 tasks , check if there are small files created during initial load of data .
`hoodie.copyonwrite.record.size.estimate=100` set this during first load of data if you have large number of small files
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org