You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2023/03/30 02:24:00 UTC

[jira] [Updated] (HUDI-5289) WriteStatus RDD is recalculated in cluster

     [ https://issues.apache.org/jira/browse/HUDI-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-5289:
--------------------------------------
    Fix Version/s: 0.13.1
                   0.12.3

> WriteStatus RDD is recalculated in cluster
> ------------------------------------------
>
>                 Key: HUDI-5289
>                 URL: https://issues.apache.org/jira/browse/HUDI-5289
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: spark
>            Reporter: zouxxyy
>            Assignee: zouxxyy
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.13.1, 0.12.3
>
>         Attachments: image-2022-11-29-10-24-08-853.png, image-2022-11-29-10-25-29-546.png, image-2022-11-29-10-26-22-050.png
>
>
> Step:
> {code:java}
> spark-submit \
> --class org.apache.hudi.utilities.HoodieClusteringJob \
> --conf spark.driver.memory=40G \
> --conf spark.executor.instances=20 \
> --conf spark.executor.memory=40G \
> --conf spark.executor.cores=4 \
> hudi-utilities-bundle_2.11-0.12.0.jar \
> --props clusteringjob.properties \
> --mode scheduleAndExecute \
> --base-path xxx \
> --table-name xxx \
> --spark-memory 40g {code}
> The following are the two stages about the job, they are all related to the calculation of WriteStatus, but some tasks in stage96 have been recalculated which taking more than ten minutes
> !image-2022-11-29-10-24-08-853.png|width=1560,height=57!
> here is stage 65
> !image-2022-11-29-10-25-29-546.png|width=640,height=515!
> here is stage 96
> !image-2022-11-29-10-26-22-050.png|width=643,height=435!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)