You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/10/08 01:36:00 UTC

[jira] [Commented] (HUDI-2363) COW : Listing leaf files and directories twice

    [ https://issues.apache.org/jira/browse/HUDI-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17425904#comment-17425904 ] 

Vinoth Chandar commented on HUDI-2363:
--------------------------------------

I think these are long fixed in recent releases. 0.7.0 IIRC. are you able to try out newer versions

> COW : Listing leaf files and directories twice
> ----------------------------------------------
>
>                 Key: HUDI-2363
>                 URL: https://issues.apache.org/jira/browse/HUDI-2363
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: selvaraj
>            Priority: Major
>         Attachments: Screen Shot 2021-08-25 at 5.36.52 PM.png
>
>
> Team,
> In our organization we are still using Hudi 0.5.0.  We would upgrade to the latest version in couple of quarters.   
> problem scenario :
> Many use cases in our project using COW and hive sync is disabled.  One of the Hudi contains two years worth of data , which are partitioned by date.  For every write on this table, i notice that Listing leaf files and directories job triggered twice. Normally it is triggered only once.  Attache the screenshot. 
>  
> once the first  listing leaf files and directories are done then another listing of leaf files and directories logs are rolled. 
> I  spent some time in investigating the source code but couldn't trace where exactly it is being invoked .
>  
> How can it be avoided here? Unfortunately this one is adding up more latency in our flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)