You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/11/06 02:46:00 UTC

[jira] [Commented] (HUDI-1492) Handle DeltaWriteStat correctly for storage schemes that support appends

    [ https://issues.apache.org/jira/browse/HUDI-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17439580#comment-17439580 ] 

Vinoth Chandar commented on HUDI-1492:
--------------------------------------

I am curious about what we are doing going to do here for say column stats. We probably would need to have the new stats merged with old stats for the same log. Bloom filters are also additive. So we good there. But not every index would be like that. So better to fix the delta commit metadata correctly.

 

that said, we already put in this code here, that will merge these file names. 

 
{code:java}
if (fileInfo.getIsDeleted()) {
  // file deletion
  combinedFileInfo.remove(filename);
}
 {code}

> Handle DeltaWriteStat correctly for storage schemes that support appends
> ------------------------------------------------------------------------
>
>                 Key: HUDI-1492
>                 URL: https://issues.apache.org/jira/browse/HUDI-1492
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Vinoth Chandar
>            Assignee: Vinoth Chandar
>            Priority: Blocker
>             Fix For: 0.10.0
>
>
> Current implementation simply uses the
> {code:java}
> String pathWithPartition = hoodieWriteStat.getPath(); {code}
> to write the metadata table. this is problematic, if the delta write was merely an append. and can technically add duplicate files into the metadata table 
> (not sure if this is a problem per se. but filing a Jira to track and either close/fix ) 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)