You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Sagar Sumit (Jira)" <ji...@apache.org> on 2021/11/20 09:25:00 UTC

[jira] [Resolved] (HUDI-2742) Multiple S3EventsHoodieIncrSource from same S3 metadata table for different Hudi tables

     [ https://issues.apache.org/jira/browse/HUDI-2742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sagar Sumit resolved HUDI-2742.
-------------------------------

> Multiple S3EventsHoodieIncrSource from same S3 metadata table for different Hudi tables
> ---------------------------------------------------------------------------------------
>
>                 Key: HUDI-2742
>                 URL: https://issues.apache.org/jira/browse/HUDI-2742
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: Sagar Sumit
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.10.0
>
>
> Use case:
> Let's say you have a source bucket which has different folders: a1, a2, a3.
> All write events on this bucket are being logged to the single s3_metadata_table.
> Now you want to run 3 S3EventsHoodieIncrSource for each of a1, a2, a3 pulling metadata from the same s3_metadata_table.
> And this should be done ensuring that no two incr sources are ingesting to the same table i.e. there should be strict separation.
> Proposed Solution:
> users can provide a filter key value and they can start multiple incr sources with different configs. In the above use case key could be s3.object.key and value could be regex that matches upto a certain part of s3 object key. We apply filter in S3EventsHoodieIncrSource [here|https://github.com/apache/hudi/blob/6b93ccca9b26b47099e9791d4363e0616e77e408/hudi-utilities/src/main/java/org/apache/hudi/utilities/sources/S3EventsHoodieIncrSource.java#L105-L109].



--
This message was sent by Atlassian Jira
(v8.20.1#820001)