You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Volodymyr Burenin (Jira)" <ji...@apache.org> on 2022/01/26 03:47:00 UTC

[jira] [Commented] (HUDI-2189) Delete partition support in HoodieDeltaStreamer

    [ https://issues.apache.org/jira/browse/HUDI-2189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17482216#comment-17482216 ] 

Volodymyr Burenin commented on HUDI-2189:
-----------------------------------------

There is a strong use case for it. At our place we run DeltaStreamer using our own scheduler(potentially will become opensource), it schedules all ingestion jobs taking into account the amount of incoming data in the kafka queue, latency requirements, etc. As well as it is looking at the number of partitions and trim them when necessary, so far it happens surgically by modifying metastore and and removing data from the storage. The size of those tables is gigantic, the data in them deprecates very fast, basically becoming useless after 10-14 days - it needs to be trimmed, otherwise the cost of keeping that data gets too high.
I would strongly recommend to provide a way to tell DeltaStreamer which partitions needs to be dropped, via CLI or via properties file, anything works, since the scheduler dynamically generates all these parameters.

[~harsh1231] [~shivnarayan] [~codope] 

> Delete partition support in HoodieDeltaStreamer 
> ------------------------------------------------
>
>                 Key: HUDI-2189
>                 URL: https://issues.apache.org/jira/browse/HUDI-2189
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: deltastreamer
>            Reporter: Samrat Deb
>            Assignee: sivabalan narayanan
>            Priority: Critical
>             Fix For: 0.11.0
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)