You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2023/03/29 21:56:00 UTC

[jira] [Commented] (HUDI-5816) Avoid loading archived timeline during meta sync

    [ https://issues.apache.org/jira/browse/HUDI-5816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17706620#comment-17706620 ] 

Ethan Guo commented on HUDI-5816:
---------------------------------

1. For syncing to Glue, advise users not to use Hive Sync.
2. Rewrite Hudi GlueSync to not have this versioning problem when updating timestamp checkpoint. (TBD if this is actually doable)
[https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog.html]
3. [for both Glue and HMS] When sync falls behind, force a HoodieMetadata#getAllPartitionPaths() or sth and do a diff against metastore and sync once .. update timestamp, let the usual flow happen out of active timeline

> Avoid loading archived timeline during meta sync
> ------------------------------------------------
>
>                 Key: HUDI-5816
>                 URL: https://issues.apache.org/jira/browse/HUDI-5816
>             Project: Apache Hudi
>          Issue Type: Improvement
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Critical
>
> We still load archived timeline when the last sync timestamp is before the active timeline, during the meta sync.  Instead, we can list all partitions as the fallback, and this is faster if the metadata table is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)