You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yue Zhang (Jira)" <ji...@apache.org> on 2022/04/03 07:16:00 UTC

[jira] [Commented] (HUDI-3650) Revisit all usages of filterPendingCompactionTimeline()

    [ https://issues.apache.org/jira/browse/HUDI-3650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17516464#comment-17516464 ] 

Yue Zhang commented on HUDI-3650:
---------------------------------

Based on master branch, There are several places calling this filterPendingCompactionTimeline API


1. BaseHoodieWriteClient#runTableServicesInline
2. BaseHoodieWriteClient#runAnyPendingCompactions
3. BaseHoodieWriteClient#startCommit

4. RunCompactionActionExecutor#execute
5. SparkRDDWriteClient#compact
6. TimelineDiffHelper#getPendingCompactionTransitions

7. CompactionUtils#getAllPendingCompactionPlans
8. CompactionUtils#getPendingCompactionInstantTimes
9. CompactionUtils#rollbackCompaction
10. CompactionUtils#rollbackEarliestCompaction

11. HoodieFlinkCompactor#compact
12. HoodieInputFormatUtils#filterInstantsTimeline
13. CompactNode#execute

 

1,2,4,5,9,10,11,13 are all used for compact action.

6,7,8 are all get all pending compaction informations.

 

3(BaseHoodieWriteClient#startCommit) is used for check start Commit instant time, guard if there are pending compactions, their instantTime must not be greater than that of this instant time 

Here is the java doc for HoodieInputFormatUtils#filterInstantsTimeline


  /**
   * Filter any specific instants that we do not want to process.
   * example timeline:
   *
   * t0 -> create bucket1.parquet
   * t1 -> create and append updates bucket1.log
   * t2 -> request compaction
   * t3 -> create bucket2.parquet
   *
   * if compaction at t2 takes a long time, incremental readers on RO tables can move to t3 and would skip updates in t1
   *
   * To workaround this problem, we want to stop returning data belonging to commits > t2.
   * After compaction is complete, incremental reader would see updates in t2, t3, so on.
   * @param timeline
   * @return
   */

 

Overall I believe we are all good for usages of filterPendingCompactionTimeline() now.

> Revisit all usages of filterPendingCompactionTimeline() 
> --------------------------------------------------------
>
>                 Key: HUDI-3650
>                 URL: https://issues.apache.org/jira/browse/HUDI-3650
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: Ethan Guo
>            Assignee: Yue Zhang
>            Priority: Blocker
>             Fix For: 0.11.0
>
>
> [https://github.com/apache/hudi/pull/4172/files]
>  
> We need to find all usages of filterPendingCompactionTimeline, getTimelineOfActions and replace them with new methods.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)