You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/06/02 14:47:00 UTC

[jira] [Updated] (HUDI-1800) Incorrect HoodieTableFileSystem API usage for pending slices causing issues

     [ https://issues.apache.org/jira/browse/HUDI-1800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-1800:
--------------------------------------
    Status: Closed  (was: Patch Available)

> Incorrect HoodieTableFileSystem API usage for pending slices causing issues
> ---------------------------------------------------------------------------
>
>                 Key: HUDI-1800
>                 URL: https://issues.apache.org/jira/browse/HUDI-1800
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Writer Core
>            Reporter: Nishith Agarwal
>            Assignee: Ryan Pifer
>            Priority: Major
>              Labels: pull-request-available, sev:critical
>
> From [~vbalaji]
>  
> We are using wrong API of FileSystemView here
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> We don't include file groups that are in pending compaction but with Hbase Index we are including them. With the current state of code, Including files in pending compaction is an issue.
> This API "getLatestFileSlicesBeforeOrOn" is originally intended to be used by CompactionAdminClient to figure out log files that were added after pending compaction and rename them such that we can undo the effects of compaction scheduling. There is a different API "getLatestMergedFileSlicesBeforeOrOn" which gives a consolidated view of the latest file slice and includes all data both before and after compaction. This is what should be used in
> [https://github.com/apache/hudi/blob/release-0.6.0/hudi-client/src/main/java/org/apache/hudi/table/action/deltacommit/UpsertDeltaCommitPartitioner.java#L85]
> The other workaround would be excluding file slices in pending compaction when we select small files to avoid the interaction between compactor and ingestion in this case. But, I think we can go with the first option
>  
> More details can be found here -> https://github.com/apache/hudi/issues/2633



--
This message was sent by Atlassian Jira
(v8.3.4#803005)