You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Vinoth Chandar <ma...@gmail.com> on 2023/04/03 15:31:18 UTC

Re: [DISCUSS] Hudi Reverse Streamer

+1

I was thinking that we add a new utility and NOT extend DeltaStreamer by
adding a Sink interface, for the following reasons

- It will make it look like a generic Source => Sink ETL tool, which is
actually not our intention to support on Hudi. There are plenty of good
tools for that out there.
- the config management can get bit hard to understand, since we overload
ingest and reverse ETL into a single tool. So break it off at use-case
level?

Thoughts?

David:  PMC does not have control over that. Please see unsubscribe
instructions here. https://hudi.apache.org/community/get-involved
Love to keep this thread about reverse streamer discussion. So kindly fork
another thread if you want to discuss unsubscribing.

On Fri, Mar 31, 2023 at 1:47 AM Davidiam <da...@gmail.com> wrote:

> Hello Vinoth,
>
> Can you please unsubscribe me?  I have been trying to unsubscribe for
> months without success.
>
> Kind Regards,
> David
>
> Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> ________________________________
> From: Vinoth Chandar <vi...@apache.org>
> Sent: Friday, March 31, 2023 5:09:52 AM
> To: dev <de...@hudi.apache.org>
> Subject: [DISCUSS] Hudi Reverse Streamer
>
> Hi all,
>
> Any interest in building a reverse streaming tool, that does the reverse of
> what the DeltaStreamer tool does? It will read Hudi table incrementally
> (only source) and write out the data to a variety of sinks - Kafka, JDBC
> Databases, DFS.
>
> This has come up many times with data warehouse users. Often times, they
> want to use Hudi to speed up or reduce costs on their data ingestion and
> ETL (using Spark/Flink), but want to move the derived data back into a data
> warehouse or an operational database for serving.
>
> What do you all think?
>
> Thanks
> Vinoth
>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Vinoth,

I have raised a PR here - https://github.com/apache/hudi/pull/9492.
Let us continue the discussion there.

On Wed, Aug 16, 2023 at 4:43 PM Vinoth Chandar <
mail.vinoth.chandar@gmail.com> wrote:

> Hi Pratyaksh,
>
> Are you still actively driving this?
>
> On Tue, Jul 11, 2023 at 2:18 PM Pratyaksh Sharma <pr...@gmail.com>
> wrote:
>
> > Update: I will be raising the initial draft of RFC in the next couple of
> > days.
> >
> > On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra <rm...@gmail.com>
> > wrote:
> >
> > > Great. We also need it for use cases of loading data into warehouses,
> and
> > > would love to help.
> > >
> > > On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma <
> pratyaksh13@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I missed this email earlier. Sure let me start an RFC this week and
> we
> > > can
> > > > take it from there.
> > > >
> > > > On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <
> > nicolas.paris@riseup.net>
> > > > wrote:
> > > >
> > > > > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have
> a
> > > use
> > > > > case to do hudi => Kafka and would enjoy building a more general
> > tool.
> > > > >
> > > > > However we need a rfc basis to start some effort in the right way
> > > > >
> > > > > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > > > > mail.vinoth.chandar@gmail.com> wrote:
> > > > > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to
> start
> > > > one,
> > > > > >given you expressed interest?
> > > > > >
> > > > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <
> > leo.biscassi@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > >> +1
> > > > > >> This would be great!
> > > > > >>
> > > > > >> Cheers,
> > > > > >>
> > > > > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> > > > pratyaksh13@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi Vinoth,
> > > > > >> >
> > > > > >> > I am aligned with the first reason that you mentioned. Better
> to
> > > > have
> > > > > a
> > > > > >> > separate tool to take care of this.
> > > > > >> >
> > > > > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > > > > >> > mail.vinoth.chandar@gmail.com>
> > > > > >> > wrote:
> > > > > >> >
> > > > > >> > > +1
> > > > > >> > >
> > > > > >> > > I was thinking that we add a new utility and NOT extend
> > > > > DeltaStreamer
> > > > > >> by
> > > > > >> > > adding a Sink interface, for the following reasons
> > > > > >> > >
> > > > > >> > > - It will make it look like a generic Source => Sink ETL
> tool,
> > > > > which is
> > > > > >> > > actually not our intention to support on Hudi. There are
> > plenty
> > > of
> > > > > good
> > > > > >> > > tools for that out there.
> > > > > >> > > - the config management can get bit hard to understand,
> since
> > we
> > > > > >> overload
> > > > > >> > > ingest and reverse ETL into a single tool. So break it off
> at
> > > > > use-case
> > > > > >> > > level?
> > > > > >> > >
> > > > > >> > > Thoughts?
> > > > > >> > >
> > > > > >> > > David:  PMC does not have control over that. Please see
> > > > unsubscribe
> > > > > >> > > instructions here.
> > > https://hudi.apache.org/community/get-involved
> > > > > >> > > Love to keep this thread about reverse streamer discussion.
> So
> > > > > kindly
> > > > > >> > fork
> > > > > >> > > another thread if you want to discuss unsubscribing.
> > > > > >> > >
> > > > > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <
> > > david.rosalia@gmail.com
> > > > >
> > > > > >> > wrote:
> > > > > >> > >
> > > > > >> > > > Hello Vinoth,
> > > > > >> > > >
> > > > > >> > > > Can you please unsubscribe me?  I have been trying to
> > > > unsubscribe
> > > > > for
> > > > > >> > > > months without success.
> > > > > >> > > >
> > > > > >> > > > Kind Regards,
> > > > > >> > > > David
> > > > > >> > > >
> > > > > >> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > > > > >> > > > ________________________________
> > > > > >> > > > From: Vinoth Chandar <vi...@apache.org>
> > > > > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > > > >> > > > To: dev <de...@hudi.apache.org>
> > > > > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > > > >> > > >
> > > > > >> > > > Hi all,
> > > > > >> > > >
> > > > > >> > > > Any interest in building a reverse streaming tool, that
> does
> > > the
> > > > > >> > reverse
> > > > > >> > > of
> > > > > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > > > > >> incrementally
> > > > > >> > > > (only source) and write out the data to a variety of
> sinks -
> > > > > Kafka,
> > > > > >> > JDBC
> > > > > >> > > > Databases, DFS.
> > > > > >> > > >
> > > > > >> > > > This has come up many times with data warehouse users.
> Often
> > > > > times,
> > > > > >> > they
> > > > > >> > > > want to use Hudi to speed up or reduce costs on their data
> > > > > ingestion
> > > > > >> > and
> > > > > >> > > > ETL (using Spark/Flink), but want to move the derived data
> > > back
> > > > > into
> > > > > >> a
> > > > > >> > > data
> > > > > >> > > > warehouse or an operational database for serving.
> > > > > >> > > >
> > > > > >> > > > What do you all think?
> > > > > >> > > >
> > > > > >> > > > Thanks
> > > > > >> > > > Vinoth
> > > > > >> > > >
> > > > > >> > >
> > > > > >> >
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> *Léo Biscassi*
> > > > > >> Blog - https://leobiscassi.com
> > > > > >>
> > > > > >>    -
> > > > > >>
> > > > >
> > > >
> > >
> > >
> > > --
> > > Take Care,
> > > Rajesh Mahindra
> > >
> >
>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Vinoth Chandar <ma...@gmail.com>.
Hi Pratyaksh,

Are you still actively driving this?

On Tue, Jul 11, 2023 at 2:18 PM Pratyaksh Sharma <pr...@gmail.com>
wrote:

> Update: I will be raising the initial draft of RFC in the next couple of
> days.
>
> On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra <rm...@gmail.com>
> wrote:
>
> > Great. We also need it for use cases of loading data into warehouses, and
> > would love to help.
> >
> > On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma <pr...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I missed this email earlier. Sure let me start an RFC this week and we
> > can
> > > take it from there.
> > >
> > > On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <
> nicolas.paris@riseup.net>
> > > wrote:
> > >
> > > > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a
> > use
> > > > case to do hudi => Kafka and would enjoy building a more general
> tool.
> > > >
> > > > However we need a rfc basis to start some effort in the right way
> > > >
> > > > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > > > mail.vinoth.chandar@gmail.com> wrote:
> > > > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> > > one,
> > > > >given you expressed interest?
> > > > >
> > > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <
> leo.biscassi@gmail.com>
> > > > wrote:
> > > > >
> > > > >> +1
> > > > >> This would be great!
> > > > >>
> > > > >> Cheers,
> > > > >>
> > > > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> > > pratyaksh13@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi Vinoth,
> > > > >> >
> > > > >> > I am aligned with the first reason that you mentioned. Better to
> > > have
> > > > a
> > > > >> > separate tool to take care of this.
> > > > >> >
> > > > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > > > >> > mail.vinoth.chandar@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > +1
> > > > >> > >
> > > > >> > > I was thinking that we add a new utility and NOT extend
> > > > DeltaStreamer
> > > > >> by
> > > > >> > > adding a Sink interface, for the following reasons
> > > > >> > >
> > > > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > > > which is
> > > > >> > > actually not our intention to support on Hudi. There are
> plenty
> > of
> > > > good
> > > > >> > > tools for that out there.
> > > > >> > > - the config management can get bit hard to understand, since
> we
> > > > >> overload
> > > > >> > > ingest and reverse ETL into a single tool. So break it off at
> > > > use-case
> > > > >> > > level?
> > > > >> > >
> > > > >> > > Thoughts?
> > > > >> > >
> > > > >> > > David:  PMC does not have control over that. Please see
> > > unsubscribe
> > > > >> > > instructions here.
> > https://hudi.apache.org/community/get-involved
> > > > >> > > Love to keep this thread about reverse streamer discussion. So
> > > > kindly
> > > > >> > fork
> > > > >> > > another thread if you want to discuss unsubscribing.
> > > > >> > >
> > > > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <
> > david.rosalia@gmail.com
> > > >
> > > > >> > wrote:
> > > > >> > >
> > > > >> > > > Hello Vinoth,
> > > > >> > > >
> > > > >> > > > Can you please unsubscribe me?  I have been trying to
> > > unsubscribe
> > > > for
> > > > >> > > > months without success.
> > > > >> > > >
> > > > >> > > > Kind Regards,
> > > > >> > > > David
> > > > >> > > >
> > > > >> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > > > >> > > > ________________________________
> > > > >> > > > From: Vinoth Chandar <vi...@apache.org>
> > > > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > > >> > > > To: dev <de...@hudi.apache.org>
> > > > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > > >> > > >
> > > > >> > > > Hi all,
> > > > >> > > >
> > > > >> > > > Any interest in building a reverse streaming tool, that does
> > the
> > > > >> > reverse
> > > > >> > > of
> > > > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > > > >> incrementally
> > > > >> > > > (only source) and write out the data to a variety of sinks -
> > > > Kafka,
> > > > >> > JDBC
> > > > >> > > > Databases, DFS.
> > > > >> > > >
> > > > >> > > > This has come up many times with data warehouse users. Often
> > > > times,
> > > > >> > they
> > > > >> > > > want to use Hudi to speed up or reduce costs on their data
> > > > ingestion
> > > > >> > and
> > > > >> > > > ETL (using Spark/Flink), but want to move the derived data
> > back
> > > > into
> > > > >> a
> > > > >> > > data
> > > > >> > > > warehouse or an operational database for serving.
> > > > >> > > >
> > > > >> > > > What do you all think?
> > > > >> > > >
> > > > >> > > > Thanks
> > > > >> > > > Vinoth
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >>
> > > > >> --
> > > > >> *Léo Biscassi*
> > > > >> Blog - https://leobiscassi.com
> > > > >>
> > > > >>    -
> > > > >>
> > > >
> > >
> >
> >
> > --
> > Take Care,
> > Rajesh Mahindra
> >
>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Update: I will be raising the initial draft of RFC in the next couple of
days.

On Thu, Jun 15, 2023 at 2:28 AM Rajesh Mahindra <rm...@gmail.com> wrote:

> Great. We also need it for use cases of loading data into warehouses, and
> would love to help.
>
> On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma <pr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I missed this email earlier. Sure let me start an RFC this week and we
> can
> > take it from there.
> >
> > On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <ni...@riseup.net>
> > wrote:
> >
> > > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a
> use
> > > case to do hudi => Kafka and would enjoy building a more general tool.
> > >
> > > However we need a rfc basis to start some effort in the right way
> > >
> > > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > > mail.vinoth.chandar@gmail.com> wrote:
> > > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> > one,
> > > >given you expressed interest?
> > > >
> > > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <le...@gmail.com>
> > > wrote:
> > > >
> > > >> +1
> > > >> This would be great!
> > > >>
> > > >> Cheers,
> > > >>
> > > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> > pratyaksh13@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > Hi Vinoth,
> > > >> >
> > > >> > I am aligned with the first reason that you mentioned. Better to
> > have
> > > a
> > > >> > separate tool to take care of this.
> > > >> >
> > > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > > >> > mail.vinoth.chandar@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > +1
> > > >> > >
> > > >> > > I was thinking that we add a new utility and NOT extend
> > > DeltaStreamer
> > > >> by
> > > >> > > adding a Sink interface, for the following reasons
> > > >> > >
> > > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > > which is
> > > >> > > actually not our intention to support on Hudi. There are plenty
> of
> > > good
> > > >> > > tools for that out there.
> > > >> > > - the config management can get bit hard to understand, since we
> > > >> overload
> > > >> > > ingest and reverse ETL into a single tool. So break it off at
> > > use-case
> > > >> > > level?
> > > >> > >
> > > >> > > Thoughts?
> > > >> > >
> > > >> > > David:  PMC does not have control over that. Please see
> > unsubscribe
> > > >> > > instructions here.
> https://hudi.apache.org/community/get-involved
> > > >> > > Love to keep this thread about reverse streamer discussion. So
> > > kindly
> > > >> > fork
> > > >> > > another thread if you want to discuss unsubscribing.
> > > >> > >
> > > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <
> david.rosalia@gmail.com
> > >
> > > >> > wrote:
> > > >> > >
> > > >> > > > Hello Vinoth,
> > > >> > > >
> > > >> > > > Can you please unsubscribe me?  I have been trying to
> > unsubscribe
> > > for
> > > >> > > > months without success.
> > > >> > > >
> > > >> > > > Kind Regards,
> > > >> > > > David
> > > >> > > >
> > > >> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > > >> > > > ________________________________
> > > >> > > > From: Vinoth Chandar <vi...@apache.org>
> > > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > >> > > > To: dev <de...@hudi.apache.org>
> > > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > >> > > >
> > > >> > > > Hi all,
> > > >> > > >
> > > >> > > > Any interest in building a reverse streaming tool, that does
> the
> > > >> > reverse
> > > >> > > of
> > > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > > >> incrementally
> > > >> > > > (only source) and write out the data to a variety of sinks -
> > > Kafka,
> > > >> > JDBC
> > > >> > > > Databases, DFS.
> > > >> > > >
> > > >> > > > This has come up many times with data warehouse users. Often
> > > times,
> > > >> > they
> > > >> > > > want to use Hudi to speed up or reduce costs on their data
> > > ingestion
> > > >> > and
> > > >> > > > ETL (using Spark/Flink), but want to move the derived data
> back
> > > into
> > > >> a
> > > >> > > data
> > > >> > > > warehouse or an operational database for serving.
> > > >> > > >
> > > >> > > > What do you all think?
> > > >> > > >
> > > >> > > > Thanks
> > > >> > > > Vinoth
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >>
> > > >> --
> > > >> *Léo Biscassi*
> > > >> Blog - https://leobiscassi.com
> > > >>
> > > >>    -
> > > >>
> > >
> >
>
>
> --
> Take Care,
> Rajesh Mahindra
>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Rajesh Mahindra <rm...@gmail.com>.
Great. We also need it for use cases of loading data into warehouses, and
would love to help.

On Wed, Jun 14, 2023 at 9:06 AM Pratyaksh Sharma <pr...@gmail.com>
wrote:

> Hi,
>
> I missed this email earlier. Sure let me start an RFC this week and we can
> take it from there.
>
> On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <ni...@riseup.net>
> wrote:
>
> > Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use
> > case to do hudi => Kafka and would enjoy building a more general tool.
> >
> > However we need a rfc basis to start some effort in the right way
> >
> > On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> > mail.vinoth.chandar@gmail.com> wrote:
> > >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start
> one,
> > >given you expressed interest?
> > >
> > >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <le...@gmail.com>
> > wrote:
> > >
> > >> +1
> > >> This would be great!
> > >>
> > >> Cheers,
> > >>
> > >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <
> pratyaksh13@gmail.com>
> > >> wrote:
> > >>
> > >> > Hi Vinoth,
> > >> >
> > >> > I am aligned with the first reason that you mentioned. Better to
> have
> > a
> > >> > separate tool to take care of this.
> > >> >
> > >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > >> > mail.vinoth.chandar@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > +1
> > >> > >
> > >> > > I was thinking that we add a new utility and NOT extend
> > DeltaStreamer
> > >> by
> > >> > > adding a Sink interface, for the following reasons
> > >> > >
> > >> > > - It will make it look like a generic Source => Sink ETL tool,
> > which is
> > >> > > actually not our intention to support on Hudi. There are plenty of
> > good
> > >> > > tools for that out there.
> > >> > > - the config management can get bit hard to understand, since we
> > >> overload
> > >> > > ingest and reverse ETL into a single tool. So break it off at
> > use-case
> > >> > > level?
> > >> > >
> > >> > > Thoughts?
> > >> > >
> > >> > > David:  PMC does not have control over that. Please see
> unsubscribe
> > >> > > instructions here. https://hudi.apache.org/community/get-involved
> > >> > > Love to keep this thread about reverse streamer discussion. So
> > kindly
> > >> > fork
> > >> > > another thread if you want to discuss unsubscribing.
> > >> > >
> > >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <david.rosalia@gmail.com
> >
> > >> > wrote:
> > >> > >
> > >> > > > Hello Vinoth,
> > >> > > >
> > >> > > > Can you please unsubscribe me?  I have been trying to
> unsubscribe
> > for
> > >> > > > months without success.
> > >> > > >
> > >> > > > Kind Regards,
> > >> > > > David
> > >> > > >
> > >> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > >> > > > ________________________________
> > >> > > > From: Vinoth Chandar <vi...@apache.org>
> > >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > >> > > > To: dev <de...@hudi.apache.org>
> > >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > >> > > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > Any interest in building a reverse streaming tool, that does the
> > >> > reverse
> > >> > > of
> > >> > > > what the DeltaStreamer tool does? It will read Hudi table
> > >> incrementally
> > >> > > > (only source) and write out the data to a variety of sinks -
> > Kafka,
> > >> > JDBC
> > >> > > > Databases, DFS.
> > >> > > >
> > >> > > > This has come up many times with data warehouse users. Often
> > times,
> > >> > they
> > >> > > > want to use Hudi to speed up or reduce costs on their data
> > ingestion
> > >> > and
> > >> > > > ETL (using Spark/Flink), but want to move the derived data back
> > into
> > >> a
> > >> > > data
> > >> > > > warehouse or an operational database for serving.
> > >> > > >
> > >> > > > What do you all think?
> > >> > > >
> > >> > > > Thanks
> > >> > > > Vinoth
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >>
> > >> --
> > >> *Léo Biscassi*
> > >> Blog - https://leobiscassi.com
> > >>
> > >>    -
> > >>
> >
>


-- 
Take Care,
Rajesh Mahindra

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi,

I missed this email earlier. Sure let me start an RFC this week and we can
take it from there.

On Wed, Jun 14, 2023 at 9:20 PM Nicolas Paris <ni...@riseup.net>
wrote:

> Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use
> case to do hudi => Kafka and would enjoy building a more general tool.
>
> However we need a rfc basis to start some effort in the right way
>
> On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <
> mail.vinoth.chandar@gmail.com> wrote:
> >Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
> >given you expressed interest?
> >
> >On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <le...@gmail.com>
> wrote:
> >
> >> +1
> >> This would be great!
> >>
> >> Cheers,
> >>
> >> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <pr...@gmail.com>
> >> wrote:
> >>
> >> > Hi Vinoth,
> >> >
> >> > I am aligned with the first reason that you mentioned. Better to have
> a
> >> > separate tool to take care of this.
> >> >
> >> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> >> > mail.vinoth.chandar@gmail.com>
> >> > wrote:
> >> >
> >> > > +1
> >> > >
> >> > > I was thinking that we add a new utility and NOT extend
> DeltaStreamer
> >> by
> >> > > adding a Sink interface, for the following reasons
> >> > >
> >> > > - It will make it look like a generic Source => Sink ETL tool,
> which is
> >> > > actually not our intention to support on Hudi. There are plenty of
> good
> >> > > tools for that out there.
> >> > > - the config management can get bit hard to understand, since we
> >> overload
> >> > > ingest and reverse ETL into a single tool. So break it off at
> use-case
> >> > > level?
> >> > >
> >> > > Thoughts?
> >> > >
> >> > > David:  PMC does not have control over that. Please see unsubscribe
> >> > > instructions here. https://hudi.apache.org/community/get-involved
> >> > > Love to keep this thread about reverse streamer discussion. So
> kindly
> >> > fork
> >> > > another thread if you want to discuss unsubscribing.
> >> > >
> >> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <da...@gmail.com>
> >> > wrote:
> >> > >
> >> > > > Hello Vinoth,
> >> > > >
> >> > > > Can you please unsubscribe me?  I have been trying to unsubscribe
> for
> >> > > > months without success.
> >> > > >
> >> > > > Kind Regards,
> >> > > > David
> >> > > >
> >> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> >> > > > ________________________________
> >> > > > From: Vinoth Chandar <vi...@apache.org>
> >> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> >> > > > To: dev <de...@hudi.apache.org>
> >> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > Any interest in building a reverse streaming tool, that does the
> >> > reverse
> >> > > of
> >> > > > what the DeltaStreamer tool does? It will read Hudi table
> >> incrementally
> >> > > > (only source) and write out the data to a variety of sinks -
> Kafka,
> >> > JDBC
> >> > > > Databases, DFS.
> >> > > >
> >> > > > This has come up many times with data warehouse users. Often
> times,
> >> > they
> >> > > > want to use Hudi to speed up or reduce costs on their data
> ingestion
> >> > and
> >> > > > ETL (using Spark/Flink), but want to move the derived data back
> into
> >> a
> >> > > data
> >> > > > warehouse or an operational database for serving.
> >> > > >
> >> > > > What do you all think?
> >> > > >
> >> > > > Thanks
> >> > > > Vinoth
> >> > > >
> >> > >
> >> >
> >>
> >>
> >> --
> >> *Léo Biscassi*
> >> Blog - https://leobiscassi.com
> >>
> >>    -
> >>
>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Nicolas Paris <ni...@riseup.net>.
Hi any rfc/ongoing efforts on the reverse delta streamer ? We have a use case to do hudi => Kafka and would enjoy building a more general tool. 

However we need a rfc basis to start some effort in the right way

On April 12, 2023 3:08:22 AM UTC, Vinoth Chandar <ma...@gmail.com> wrote:
>Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
>given you expressed interest?
>
>On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <le...@gmail.com> wrote:
>
>> +1
>> This would be great!
>>
>> Cheers,
>>
>> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <pr...@gmail.com>
>> wrote:
>>
>> > Hi Vinoth,
>> >
>> > I am aligned with the first reason that you mentioned. Better to have a
>> > separate tool to take care of this.
>> >
>> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
>> > mail.vinoth.chandar@gmail.com>
>> > wrote:
>> >
>> > > +1
>> > >
>> > > I was thinking that we add a new utility and NOT extend DeltaStreamer
>> by
>> > > adding a Sink interface, for the following reasons
>> > >
>> > > - It will make it look like a generic Source => Sink ETL tool, which is
>> > > actually not our intention to support on Hudi. There are plenty of good
>> > > tools for that out there.
>> > > - the config management can get bit hard to understand, since we
>> overload
>> > > ingest and reverse ETL into a single tool. So break it off at use-case
>> > > level?
>> > >
>> > > Thoughts?
>> > >
>> > > David:  PMC does not have control over that. Please see unsubscribe
>> > > instructions here. https://hudi.apache.org/community/get-involved
>> > > Love to keep this thread about reverse streamer discussion. So kindly
>> > fork
>> > > another thread if you want to discuss unsubscribing.
>> > >
>> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <da...@gmail.com>
>> > wrote:
>> > >
>> > > > Hello Vinoth,
>> > > >
>> > > > Can you please unsubscribe me?  I have been trying to unsubscribe for
>> > > > months without success.
>> > > >
>> > > > Kind Regards,
>> > > > David
>> > > >
>> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
>> > > > ________________________________
>> > > > From: Vinoth Chandar <vi...@apache.org>
>> > > > Sent: Friday, March 31, 2023 5:09:52 AM
>> > > > To: dev <de...@hudi.apache.org>
>> > > > Subject: [DISCUSS] Hudi Reverse Streamer
>> > > >
>> > > > Hi all,
>> > > >
>> > > > Any interest in building a reverse streaming tool, that does the
>> > reverse
>> > > of
>> > > > what the DeltaStreamer tool does? It will read Hudi table
>> incrementally
>> > > > (only source) and write out the data to a variety of sinks - Kafka,
>> > JDBC
>> > > > Databases, DFS.
>> > > >
>> > > > This has come up many times with data warehouse users. Often times,
>> > they
>> > > > want to use Hudi to speed up or reduce costs on their data ingestion
>> > and
>> > > > ETL (using Spark/Flink), but want to move the derived data back into
>> a
>> > > data
>> > > > warehouse or an operational database for serving.
>> > > >
>> > > > What do you all think?
>> > > >
>> > > > Thanks
>> > > > Vinoth
>> > > >
>> > >
>> >
>>
>>
>> --
>> *Léo Biscassi*
>> Blog - https://leobiscassi.com
>>
>>    -
>>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Vinoth Chandar <ma...@gmail.com>.
Cool. lets draw up a RFC for this? @pratyaksh - do you want to start one,
given you expressed interest?

On Mon, Apr 10, 2023 at 7:32 PM Léo Biscassi <le...@gmail.com> wrote:

> +1
> This would be great!
>
> Cheers,
>
> On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <pr...@gmail.com>
> wrote:
>
> > Hi Vinoth,
> >
> > I am aligned with the first reason that you mentioned. Better to have a
> > separate tool to take care of this.
> >
> > On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> > mail.vinoth.chandar@gmail.com>
> > wrote:
> >
> > > +1
> > >
> > > I was thinking that we add a new utility and NOT extend DeltaStreamer
> by
> > > adding a Sink interface, for the following reasons
> > >
> > > - It will make it look like a generic Source => Sink ETL tool, which is
> > > actually not our intention to support on Hudi. There are plenty of good
> > > tools for that out there.
> > > - the config management can get bit hard to understand, since we
> overload
> > > ingest and reverse ETL into a single tool. So break it off at use-case
> > > level?
> > >
> > > Thoughts?
> > >
> > > David:  PMC does not have control over that. Please see unsubscribe
> > > instructions here. https://hudi.apache.org/community/get-involved
> > > Love to keep this thread about reverse streamer discussion. So kindly
> > fork
> > > another thread if you want to discuss unsubscribing.
> > >
> > > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <da...@gmail.com>
> > wrote:
> > >
> > > > Hello Vinoth,
> > > >
> > > > Can you please unsubscribe me?  I have been trying to unsubscribe for
> > > > months without success.
> > > >
> > > > Kind Regards,
> > > > David
> > > >
> > > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > > > ________________________________
> > > > From: Vinoth Chandar <vi...@apache.org>
> > > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > > To: dev <de...@hudi.apache.org>
> > > > Subject: [DISCUSS] Hudi Reverse Streamer
> > > >
> > > > Hi all,
> > > >
> > > > Any interest in building a reverse streaming tool, that does the
> > reverse
> > > of
> > > > what the DeltaStreamer tool does? It will read Hudi table
> incrementally
> > > > (only source) and write out the data to a variety of sinks - Kafka,
> > JDBC
> > > > Databases, DFS.
> > > >
> > > > This has come up many times with data warehouse users. Often times,
> > they
> > > > want to use Hudi to speed up or reduce costs on their data ingestion
> > and
> > > > ETL (using Spark/Flink), but want to move the derived data back into
> a
> > > data
> > > > warehouse or an operational database for serving.
> > > >
> > > > What do you all think?
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > >
> >
>
>
> --
> *Léo Biscassi*
> Blog - https://leobiscassi.com
>
>    -
>

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Léo Biscassi <le...@gmail.com>.
+1
This would be great!

Cheers,

On Mon, Apr 3, 2023 at 3:00 PM Pratyaksh Sharma <pr...@gmail.com>
wrote:

> Hi Vinoth,
>
> I am aligned with the first reason that you mentioned. Better to have a
> separate tool to take care of this.
>
> On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <
> mail.vinoth.chandar@gmail.com>
> wrote:
>
> > +1
> >
> > I was thinking that we add a new utility and NOT extend DeltaStreamer by
> > adding a Sink interface, for the following reasons
> >
> > - It will make it look like a generic Source => Sink ETL tool, which is
> > actually not our intention to support on Hudi. There are plenty of good
> > tools for that out there.
> > - the config management can get bit hard to understand, since we overload
> > ingest and reverse ETL into a single tool. So break it off at use-case
> > level?
> >
> > Thoughts?
> >
> > David:  PMC does not have control over that. Please see unsubscribe
> > instructions here. https://hudi.apache.org/community/get-involved
> > Love to keep this thread about reverse streamer discussion. So kindly
> fork
> > another thread if you want to discuss unsubscribing.
> >
> > On Fri, Mar 31, 2023 at 1:47 AM Davidiam <da...@gmail.com>
> wrote:
> >
> > > Hello Vinoth,
> > >
> > > Can you please unsubscribe me?  I have been trying to unsubscribe for
> > > months without success.
> > >
> > > Kind Regards,
> > > David
> > >
> > > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > > ________________________________
> > > From: Vinoth Chandar <vi...@apache.org>
> > > Sent: Friday, March 31, 2023 5:09:52 AM
> > > To: dev <de...@hudi.apache.org>
> > > Subject: [DISCUSS] Hudi Reverse Streamer
> > >
> > > Hi all,
> > >
> > > Any interest in building a reverse streaming tool, that does the
> reverse
> > of
> > > what the DeltaStreamer tool does? It will read Hudi table incrementally
> > > (only source) and write out the data to a variety of sinks - Kafka,
> JDBC
> > > Databases, DFS.
> > >
> > > This has come up many times with data warehouse users. Often times,
> they
> > > want to use Hudi to speed up or reduce costs on their data ingestion
> and
> > > ETL (using Spark/Flink), but want to move the derived data back into a
> > data
> > > warehouse or an operational database for serving.
> > >
> > > What do you all think?
> > >
> > > Thanks
> > > Vinoth
> > >
> >
>


-- 
*Léo Biscassi*
Blog - https://leobiscassi.com

   -

Re: [DISCUSS] Hudi Reverse Streamer

Posted by Pratyaksh Sharma <pr...@gmail.com>.
Hi Vinoth,

I am aligned with the first reason that you mentioned. Better to have a
separate tool to take care of this.

On Mon, Apr 3, 2023 at 9:01 PM Vinoth Chandar <ma...@gmail.com>
wrote:

> +1
>
> I was thinking that we add a new utility and NOT extend DeltaStreamer by
> adding a Sink interface, for the following reasons
>
> - It will make it look like a generic Source => Sink ETL tool, which is
> actually not our intention to support on Hudi. There are plenty of good
> tools for that out there.
> - the config management can get bit hard to understand, since we overload
> ingest and reverse ETL into a single tool. So break it off at use-case
> level?
>
> Thoughts?
>
> David:  PMC does not have control over that. Please see unsubscribe
> instructions here. https://hudi.apache.org/community/get-involved
> Love to keep this thread about reverse streamer discussion. So kindly fork
> another thread if you want to discuss unsubscribing.
>
> On Fri, Mar 31, 2023 at 1:47 AM Davidiam <da...@gmail.com> wrote:
>
> > Hello Vinoth,
> >
> > Can you please unsubscribe me?  I have been trying to unsubscribe for
> > months without success.
> >
> > Kind Regards,
> > David
> >
> > Sent from Outlook for Android<https://aka.ms/AAb9ysg>
> > ________________________________
> > From: Vinoth Chandar <vi...@apache.org>
> > Sent: Friday, March 31, 2023 5:09:52 AM
> > To: dev <de...@hudi.apache.org>
> > Subject: [DISCUSS] Hudi Reverse Streamer
> >
> > Hi all,
> >
> > Any interest in building a reverse streaming tool, that does the reverse
> of
> > what the DeltaStreamer tool does? It will read Hudi table incrementally
> > (only source) and write out the data to a variety of sinks - Kafka, JDBC
> > Databases, DFS.
> >
> > This has come up many times with data warehouse users. Often times, they
> > want to use Hudi to speed up or reduce costs on their data ingestion and
> > ETL (using Spark/Flink), but want to move the derived data back into a
> data
> > warehouse or an operational database for serving.
> >
> > What do you all think?
> >
> > Thanks
> > Vinoth
> >
>