You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yanjia Gary Li (Jira)" <ji...@apache.org> on 2020/01/03 05:08:00 UTC

[jira] [Updated] (HUDI-494) [DEBUGGING] Huge amount of tasks when writing files into HDFS

     [ https://issues.apache.org/jira/browse/HUDI-494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yanjia Gary Li updated HUDI-494:
--------------------------------
    Attachment: Screen Shot 2020-01-02 at 8.53.24 PM.png

> [DEBUGGING] Huge amount of tasks when writing files into HDFS
> -------------------------------------------------------------
>
>                 Key: HUDI-494
>                 URL: https://issues.apache.org/jira/browse/HUDI-494
>             Project: Apache Hudi (incubating)
>          Issue Type: Test
>            Reporter: Yanjia Gary Li
>            Assignee: Vinoth Chandar
>            Priority: Major
>         Attachments: Screen Shot 2020-01-02 at 8.53.24 PM.png, Screen Shot 2020-01-02 at 8.53.44 PM.png
>
>
> I am using the manual build master after [https://github.com/apache/incubator-hudi/commit/36b3b6f5dd913d3f1c9aa116aff8daf6540fed65] commit. 
> I am seeing 3 million tasks when the Hudi Spark job writing the files into HDFS. 
> I am seeing a huge amount of 0 byte files being written into .hoodie/.temp/ folder in my HDFS. In the Spark UI, each task only writes less than 10 records in
> {code:java}
> count at HoodieSparkSqlWriter{code}
>  All the stages before this seems normal. Any idea what happened here?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)