You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/08/18 19:43:00 UTC

[jira] [Created] (HUDI-2318) Enhance and stablize multi-table deltastreamer

sivabalan narayanan created HUDI-2318:
-----------------------------------------

             Summary: Enhance and stablize multi-table deltastreamer
                 Key: HUDI-2318
                 URL: https://issues.apache.org/jira/browse/HUDI-2318
             Project: Apache Hudi
          Issue Type: Improvement
          Components: Utilities
            Reporter: sivabalan narayanan


Currently multi-table deltastreamer supports COW and only for run once mode. We need to enhance lot more and make it usable for all different scenarios. 

 

There are asks from the community on this. Typical use-cases:

I have 1000+ tables and I wish to ingest all of them into hudi parallely. I don't want to use 1000+ delta streamer instances as I have to allot resources for every deltastreamer instance. 

 

Requirements
 * Add MOR support to Multi-table deltastreamer
 * Add continuous mode support to multi-table ds.
 * Add support to sync concurrently across diff tables.  As of now, each table is synced serially which may not work out well for 1000+ tables. Thread-pool is an option. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)