You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pratyaksh Sharma (Jira)" <ji...@apache.org> on 2022/03/07 18:19:00 UTC
[jira] [Assigned] (HUDI-2318) Enhance and stablize multi-table deltastreamer
[ https://issues.apache.org/jira/browse/HUDI-2318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pratyaksh Sharma reassigned HUDI-2318:
--------------------------------------
Assignee: Pratyaksh Sharma
> Enhance and stablize multi-table deltastreamer
> ----------------------------------------------
>
> Key: HUDI-2318
> URL: https://issues.apache.org/jira/browse/HUDI-2318
> Project: Apache Hudi
> Issue Type: Improvement
> Components: Utilities
> Reporter: sivabalan narayanan
> Assignee: Pratyaksh Sharma
> Priority: Major
>
> Currently multi-table deltastreamer supports COW and only for run once mode. We need to enhance lot more and make it usable for all different scenarios.
>
> There are asks from the community on this. Typical use-cases:
> I have 1000+ tables and I wish to ingest all of them into hudi efficiently. I don't want to use 1000+ delta streamer instances as I have to allot resources for every deltastreamer instance.
>
> Requirements
> * Add MOR support to Multi-table deltastreamer
> * Add continuous mode support to multi-table ds.
> * Add support to sync concurrently across diff tables. As of now, each table is synced serially which may not work out well for 1000+ tables. And we may not want to sync all 1000+ tables concurrently. But using a thread-pool, we can achieve some level of concurrency.
> ** Check out [https://github.com/apache/hudi/issues/2175] to ingest to multiple hudi tables using spark structured streaming. We can also try to see if we can add it as utility.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)