You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/11 05:13:16 UTC
[GitHub] [hudi] yuzhaojing commented on pull request #4309: [HUDI-3016][RFC-43] Proposal to implement Table Management Service

yuzhaojing commented on PR #4309:
URL: https://github.com/apache/hudi/pull/4309#issuecomment-1123194039

   After everyone's discussions and suggestions on Table Management Service in the past period of time, I will summarize the discussions and combine my own ideas for a long-term plan
   
   ## Long-term planning
   
   ### Processing mode
   Different processing modes depending on whether the meta server is enabled 
   
   - Enable meta server 
     - The pull-based mechanism works for fewer tables. Scanning 1000s of tables for possible services is going to induce lots of a load of listing. 
     - The meta server provides a listener that takes as input the uris of the Table Management Service and triggers a callback through the hook at each instant commit, thereby calling the Table Management Service to do the scheduling/execution for the table.
   
   - Do not enable meta server 
     - for every write/commit on the table, the table management server is notified.
        We can set a heartbeat timeout for each hoodie table, and if it exceeds it, we will actively pull it once to prevent the commit request from being lost
   
   ### Processing flow
   
   - After receiving the request, the table management server schedules the relevant table service to the table's timeline 
   - Persist each table service into an instance table of Table Management Service 
   - notify a separate execution component/thread can start executing it 
   - Monitor task execution status, update table information, and retry failed table services up to the maximum number of times
   
   ### Storage
   
   - There are two types of stored information 
     - Register with the hoodie table of the Table Management Service 
     - Each table service instance is generated by Table Management Service 
   
   - Considering the immediacy of storage and response, we temporarily use MySQL as the default storage and reserve the abstract interface to implement other storage as needed in the future, such as rocksdb
   
   ### Execute
       
   Provides an abstract Execution Engine to support Spark / Flink commits and return results
       
   ### Monitoring and Alarm
       
   Expose some metrics of Table Management Service, such as qps , scheduling time, submission time, etc. and issue an alarm when the task fails
       
   ### API
   
   - Support (REST / GRPC)
   - Implement API endpoint for cli and writer
      
   ### Cli
      
   Provides some ability to operate Table Management Service, such as List all instances, Add instance, Remove instance, Clear jobs for table (etc)
      
   ### Writer
   
   - Enable meta server
     - commit instant to meta server and skip any scheduling + execution of table services 
   - Do not enable meta server
     - commit instant and request to Table Management Service for scheduling + execution of table services
      
   ### Multiple instances
      
   Table Management Service instances are stateless, each instance is processed based on commit requests (scheduling + execution of table services), and non-repeated scheduling is guaranteed through meta server or ZK locking
   
   ## Short-term plan
   
   Since the meta server is not implemented, and the current asynchronous scheduler service requires additional dependencies ( ZK ), some modifications have been made to the functions implemented in the first phase to implement quickly
   
   1. Schedule table services in writer reduce the dependencies of users using Table Management Service. After the meta server is implemented, the meta server can be relied on to asynchronously schedule table services.
   2. request Table Management Service in execution（like HoodieWriteClient.clean(...) will detect that TableServices are enabled and instead of running clean, it will schedule it with the TableService by calling in the API endpoint.）
      
   ## phase1
   
   Implement the basic functional part of the long-term planning without enabling meta server
   
   1. Processing mode + processing flow
   2. Storage
   3. Execution (only implements Spark )
   4. Monitoring (only expose basic success, failure and other indicators)
   5. API（only REST）
   6. Writer
   
   **Landing plan: 0.12**
   
   ## Phase2
   
   Realize the integration part with the meta server and improve some capabilities
   
   1. Improve monitoring and alarm indicators and provide alarm functions
   2. API （ GRPC ）
   3. Schedule table service in Table Management Service（with meta server）
   4. Multi-instance implementation
   
   **Landing plan: 1.0**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org