You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Sagar Sumit (Jira)" <ji...@apache.org> on 2021/11/20 09:25:00 UTC
[jira] [Resolved] (HUDI-2742) Multiple S3EventsHoodieIncrSource from same S3 metadata table for different Hudi tables
[ https://issues.apache.org/jira/browse/HUDI-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sagar Sumit resolved HUDI-2742.
-------------------------------
> Multiple S3EventsHoodieIncrSource from same S3 metadata table for different Hudi tables
> ---------------------------------------------------------------------------------------
>
> Key: HUDI-2742
> URL: https://issues.apache.org/jira/browse/HUDI-2742
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: Sagar Sumit
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 0.10.0
>
>
> Use case:
> Let's say you have a source bucket which has different folders: a1, a2, a3.
> All write events on this bucket are being logged to the single s3_metadata_table.
> Now you want to run 3 S3EventsHoodieIncrSource for each of a1, a2, a3 pulling metadata from the same s3_metadata_table.
> And this should be done ensuring that no two incr sources are ingesting to the same table i.e. there should be strict separation.
> Proposed Solution:
> users can provide a filter key value and they can start multiple incr sources with different configs. In the above use case key could be s3.object.key and value could be regex that matches upto a certain part of s3 object key. We apply filter in S3EventsHoodieIncrSource [here|https://github.com/apache/hudi/blob/6b93ccca9b26b47099e9791d4363e0616e77e408/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java#L105-L109].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)