You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Purushotham Pushpavanthar <pu...@gmail.com> on 2020/01/06 07:41:31 UTC

[DISCUSS] RFC-14 JDBC incremental puller

Hi everyone,

We are working on introducing JDBC Delta Streamer as
one of the sources for HUDI. We've drafted initial version of design in
RFC-14.
Kindly review and let us know your thoughts.

I'm initiating this thread to discuss few comments raised by Vinoth.

   1. As discussed on the RFC page, we are going to support compound
   incremental columns.
   2. About arbitrary queries (multi table complex queries), we are
   planning to support this only in Bulk Mode.


[1] https://issues.apache.org/jira/browse/HUDI-251
[2]
https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller

Regards,
Purushotham Pushpavanth

Re: [DISCUSS] RFC-14 JDBC incremental puller

Posted by Vinoth Chandar <vi...@apache.org>.
My 2c on this.. This is a super valuable source for DeltaStreamer and in
the interest of making forward progress, it is totally fine to make things
like "arbitrary queries" a follow on item..
Just supporting incremental columns, time stamp columns and may be compound
columns, would already bring immense value..

My only additional suggestion is to go over the existing PR, and see if
there are some implementation aspects that need more upfront design e.g
deriving the checkpoints from extracted data..
Also if any change to existing Source, DeltaStreamer framework is required,
it might be best to call this out briefly in the RFC. (I think we should be
ok; just being thorough).

Overall, a big +1 from me

On Mon, Jan 6, 2020 at 6:01 AM 蒋晓峰 <pr...@163.com> wrote:

> Yeah,I got it. Thanks for Rushiraj to explain. Good idea.
>
>
>
>
> | |
> 蒋晓峰
> |
> |
> 邮箱:programgeek@163.com
> |
>
> 签名由 网易邮箱大师 定制
>
> On 01/06/2020 19:58, rushiraj chavan wrote:
> Hi Nicholas,
>
> We discussed about incremental mode for arbitrary queries at our end
> and we came up with an idea. We can put a filtering criteria using
> where clause and provide placeholder for incrementing conditions.
>
> Thanks,
>
> Rushiraj
>
> On Mon, Jan 6, 2020 at 4:50 PM rushiraj chavan <ru...@gmail.com>
> wrote:
> >
> > Hi Nicholas,
> >
> > I guess it can be made pluggable. Sorry I didn't understand what you
> > meant by ... SPI....
> >
> > Purushotham(pushpavanthar@gmail.com) and I are working on JDBC Delta
> > Streamer. It is WIP.
> >
> > Thanks,
> > Rushiraj
> >
> > On Mon, Jan 6, 2020 at 4:20 PM 蒋晓峰 <pr...@163.com> wrote:
> > >
> > > Hi Rushiraj,
> > > As you said, Is the idea that provide way to pass custom parameters to
> the arbitrary queries pluggable? I think that user could scale the
> implement by SPI. And Has the work of JDBC Delta Streamer already done?
> > >
> > >
> > > Bests,
> > > Nicholas
> > >
> > > At 2020-01-06 18:05:28, "rushiraj chavan" <ru...@gmail.com>
> wrote:
> > > >Hi Nicholas,
> > > >
> > > >We haven't given enough thought to it. At a high level, it looks bit
> > > >hard to generalise as we won't have any control
> > > > over the arbitrary queries. In that case, the burden would be on the
> > > >user to configure complex queries to have
> > > > incremental nature. We would also need to provide a way to pass
> > > >custom parameters to the arbitrary queries which
> > > >would be defined by the user. It is feasible but needs significant
> > > >design work. We will make note of this in our future work.
> > > >
> > > >I hope that makes sense.
> > > >
> > > >Thanks,
> > > >Rushiraj
> > > >
> > > >On Mon, Jan 6, 2020 at 2:40 PM 蒋晓峰 <pr...@163.com> wrote:
> > > >>
> > > >> Hi Purushotham,
> > > >>     About arbitrary queries (multi table complex queries), why
> support this only in Bulk Mode? What concern about this?
> > > >> Thanks,
> > > >> Nicholas
> > > >>
> > > >>
> > > >> At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <
> pushpavanthar@gmail.com> wrote:
> > > >> >Hi everyone,
> > > >> >
> > > >> >We are working on introducing JDBC Delta Streamer as
> > > >> >one of the sources for HUDI. We've drafted initial version of
> design in
> > > >> >RFC-14.
> > > >> >Kindly review and let us know your thoughts.
> > > >> >
> > > >> >I'm initiating this thread to discuss few comments raised by
> Vinoth.
> > > >> >
> > > >> >   1. As discussed on the RFC page, we are going to support
> compound
> > > >> >   incremental columns.
> > > >> >   2. About arbitrary queries (multi table complex queries), we are
> > > >> >   planning to support this only in Bulk Mode.
> > > >> >
> > > >> >
> > > >> >[1] https://issues.apache.org/jira/browse/HUDI-251
> > > >> >[2]
> > > >> >
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> > > >> >
> > > >> >Regards,
> > > >> >Purushotham Pushpavanth
>

Re: [DISCUSS] RFC-14 JDBC incremental puller

Posted by 蒋晓峰 <pr...@163.com>.
Yeah,I got it. Thanks for Rushiraj to explain. Good idea.




| |
蒋晓峰
|
|
邮箱:programgeek@163.com
|

签名由 网易邮箱大师 定制

On 01/06/2020 19:58, rushiraj chavan wrote:
Hi Nicholas,

We discussed about incremental mode for arbitrary queries at our end
and we came up with an idea. We can put a filtering criteria using
where clause and provide placeholder for incrementing conditions.

Thanks,

Rushiraj

On Mon, Jan 6, 2020 at 4:50 PM rushiraj chavan <ru...@gmail.com> wrote:
>
> Hi Nicholas,
>
> I guess it can be made pluggable. Sorry I didn't understand what you
> meant by ... SPI....
>
> Purushotham(pushpavanthar@gmail.com) and I are working on JDBC Delta
> Streamer. It is WIP.
>
> Thanks,
> Rushiraj
>
> On Mon, Jan 6, 2020 at 4:20 PM 蒋晓峰 <pr...@163.com> wrote:
> >
> > Hi Rushiraj,
> > As you said, Is the idea that provide way to pass custom parameters to the arbitrary queries pluggable? I think that user could scale the implement by SPI. And Has the work of JDBC Delta Streamer already done?
> >
> >
> > Bests,
> > Nicholas
> >
> > At 2020-01-06 18:05:28, "rushiraj chavan" <ru...@gmail.com> wrote:
> > >Hi Nicholas,
> > >
> > >We haven't given enough thought to it. At a high level, it looks bit
> > >hard to generalise as we won't have any control
> > > over the arbitrary queries. In that case, the burden would be on the
> > >user to configure complex queries to have
> > > incremental nature. We would also need to provide a way to pass
> > >custom parameters to the arbitrary queries which
> > >would be defined by the user. It is feasible but needs significant
> > >design work. We will make note of this in our future work.
> > >
> > >I hope that makes sense.
> > >
> > >Thanks,
> > >Rushiraj
> > >
> > >On Mon, Jan 6, 2020 at 2:40 PM 蒋晓峰 <pr...@163.com> wrote:
> > >>
> > >> Hi Purushotham,
> > >>     About arbitrary queries (multi table complex queries), why support this only in Bulk Mode? What concern about this?
> > >> Thanks,
> > >> Nicholas
> > >>
> > >>
> > >> At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <pu...@gmail.com> wrote:
> > >> >Hi everyone,
> > >> >
> > >> >We are working on introducing JDBC Delta Streamer as
> > >> >one of the sources for HUDI. We've drafted initial version of design in
> > >> >RFC-14.
> > >> >Kindly review and let us know your thoughts.
> > >> >
> > >> >I'm initiating this thread to discuss few comments raised by Vinoth.
> > >> >
> > >> >   1. As discussed on the RFC page, we are going to support compound
> > >> >   incremental columns.
> > >> >   2. About arbitrary queries (multi table complex queries), we are
> > >> >   planning to support this only in Bulk Mode.
> > >> >
> > >> >
> > >> >[1] https://issues.apache.org/jira/browse/HUDI-251
> > >> >[2]
> > >> >https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> > >> >
> > >> >Regards,
> > >> >Purushotham Pushpavanth

Re: Re: [DISCUSS] RFC-14 JDBC incremental puller

Posted by rushiraj chavan <ru...@gmail.com>.
Hi Nicholas,

We discussed about incremental mode for arbitrary queries at our end
and we came up with an idea. We can put a filtering criteria using
where clause and provide placeholder for incrementing conditions.

Thanks,
Rushiraj

On Mon, Jan 6, 2020 at 4:50 PM rushiraj chavan <ru...@gmail.com> wrote:
>
> Hi Nicholas,
>
> I guess it can be made pluggable. Sorry I didn't understand what you
> meant by ... SPI....
>
> Purushotham(pushpavanthar@gmail.com) and I are working on JDBC Delta
> Streamer. It is WIP.
>
> Thanks,
> Rushiraj
>
> On Mon, Jan 6, 2020 at 4:20 PM 蒋晓峰 <pr...@163.com> wrote:
> >
> > Hi Rushiraj,
> > As you said, Is the idea that provide way to pass custom parameters to the arbitrary queries pluggable? I think that user could scale the implement by SPI. And Has the work of JDBC Delta Streamer already done?
> >
> >
> > Bests,
> > Nicholas
> >
> > At 2020-01-06 18:05:28, "rushiraj chavan" <ru...@gmail.com> wrote:
> > >Hi Nicholas,
> > >
> > >We haven't given enough thought to it. At a high level, it looks bit
> > >hard to generalise as we won't have any control
> > > over the arbitrary queries. In that case, the burden would be on the
> > >user to configure complex queries to have
> > > incremental nature. We would also need to provide a way to pass
> > >custom parameters to the arbitrary queries which
> > >would be defined by the user. It is feasible but needs significant
> > >design work. We will make note of this in our future work.
> > >
> > >I hope that makes sense.
> > >
> > >Thanks,
> > >Rushiraj
> > >
> > >On Mon, Jan 6, 2020 at 2:40 PM 蒋晓峰 <pr...@163.com> wrote:
> > >>
> > >> Hi Purushotham,
> > >>     About arbitrary queries (multi table complex queries), why support this only in Bulk Mode? What concern about this?
> > >> Thanks,
> > >> Nicholas
> > >>
> > >>
> > >> At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <pu...@gmail.com> wrote:
> > >> >Hi everyone,
> > >> >
> > >> >We are working on introducing JDBC Delta Streamer as
> > >> >one of the sources for HUDI. We've drafted initial version of design in
> > >> >RFC-14.
> > >> >Kindly review and let us know your thoughts.
> > >> >
> > >> >I'm initiating this thread to discuss few comments raised by Vinoth.
> > >> >
> > >> >   1. As discussed on the RFC page, we are going to support compound
> > >> >   incremental columns.
> > >> >   2. About arbitrary queries (multi table complex queries), we are
> > >> >   planning to support this only in Bulk Mode.
> > >> >
> > >> >
> > >> >[1] https://issues.apache.org/jira/browse/HUDI-251
> > >> >[2]
> > >> >https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> > >> >
> > >> >Regards,
> > >> >Purushotham Pushpavanth

Re: Re: [DISCUSS] RFC-14 JDBC incremental puller

Posted by rushiraj chavan <ru...@gmail.com>.
Hi Nicholas,

I guess it can be made pluggable. Sorry I didn't understand what you
meant by ... SPI....

Purushotham(pushpavanthar@gmail.com) and I are working on JDBC Delta
Streamer. It is WIP.

Thanks,
Rushiraj

On Mon, Jan 6, 2020 at 4:20 PM 蒋晓峰 <pr...@163.com> wrote:
>
> Hi Rushiraj,
> As you said, Is the idea that provide way to pass custom parameters to the arbitrary queries pluggable? I think that user could scale the implement by SPI. And Has the work of JDBC Delta Streamer already done?
>
>
> Bests,
> Nicholas
>
> At 2020-01-06 18:05:28, "rushiraj chavan" <ru...@gmail.com> wrote:
> >Hi Nicholas,
> >
> >We haven't given enough thought to it. At a high level, it looks bit
> >hard to generalise as we won't have any control
> > over the arbitrary queries. In that case, the burden would be on the
> >user to configure complex queries to have
> > incremental nature. We would also need to provide a way to pass
> >custom parameters to the arbitrary queries which
> >would be defined by the user. It is feasible but needs significant
> >design work. We will make note of this in our future work.
> >
> >I hope that makes sense.
> >
> >Thanks,
> >Rushiraj
> >
> >On Mon, Jan 6, 2020 at 2:40 PM 蒋晓峰 <pr...@163.com> wrote:
> >>
> >> Hi Purushotham,
> >>     About arbitrary queries (multi table complex queries), why support this only in Bulk Mode? What concern about this?
> >> Thanks,
> >> Nicholas
> >>
> >>
> >> At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <pu...@gmail.com> wrote:
> >> >Hi everyone,
> >> >
> >> >We are working on introducing JDBC Delta Streamer as
> >> >one of the sources for HUDI. We've drafted initial version of design in
> >> >RFC-14.
> >> >Kindly review and let us know your thoughts.
> >> >
> >> >I'm initiating this thread to discuss few comments raised by Vinoth.
> >> >
> >> >   1. As discussed on the RFC page, we are going to support compound
> >> >   incremental columns.
> >> >   2. About arbitrary queries (multi table complex queries), we are
> >> >   planning to support this only in Bulk Mode.
> >> >
> >> >
> >> >[1] https://issues.apache.org/jira/browse/HUDI-251
> >> >[2]
> >> >https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> >> >
> >> >Regards,
> >> >Purushotham Pushpavanth

Re:Re: [DISCUSS] RFC-14 JDBC incremental puller

Posted by 蒋晓峰 <pr...@163.com>.
Hi Rushiraj,
As you said, Is the idea that provide way to pass custom parameters to the arbitrary queries pluggable? I think that user could scale the implement by SPI. And Has the work of JDBC Delta Streamer already done?


Bests,
Nicholas

At 2020-01-06 18:05:28, "rushiraj chavan" <ru...@gmail.com> wrote:
>Hi Nicholas,
>
>We haven't given enough thought to it. At a high level, it looks bit
>hard to generalise as we won't have any control
> over the arbitrary queries. In that case, the burden would be on the
>user to configure complex queries to have
> incremental nature. We would also need to provide a way to pass
>custom parameters to the arbitrary queries which
>would be defined by the user. It is feasible but needs significant
>design work. We will make note of this in our future work.
>
>I hope that makes sense.
>
>Thanks,
>Rushiraj
>
>On Mon, Jan 6, 2020 at 2:40 PM 蒋晓峰 <pr...@163.com> wrote:
>>
>> Hi Purushotham,
>>     About arbitrary queries (multi table complex queries), why support this only in Bulk Mode? What concern about this?
>> Thanks,
>> Nicholas
>>
>>
>> At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <pu...@gmail.com> wrote:
>> >Hi everyone,
>> >
>> >We are working on introducing JDBC Delta Streamer as
>> >one of the sources for HUDI. We've drafted initial version of design in
>> >RFC-14.
>> >Kindly review and let us know your thoughts.
>> >
>> >I'm initiating this thread to discuss few comments raised by Vinoth.
>> >
>> >   1. As discussed on the RFC page, we are going to support compound
>> >   incremental columns.
>> >   2. About arbitrary queries (multi table complex queries), we are
>> >   planning to support this only in Bulk Mode.
>> >
>> >
>> >[1] https://issues.apache.org/jira/browse/HUDI-251
>> >[2]
>> >https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
>> >
>> >Regards,
>> >Purushotham Pushpavanth

Re: [DISCUSS] RFC-14 JDBC incremental puller

Posted by rushiraj chavan <ru...@gmail.com>.
Hi Nicholas,

We haven't given enough thought to it. At a high level, it looks bit
hard to generalise as we won't have any control
 over the arbitrary queries. In that case, the burden would be on the
user to configure complex queries to have
 incremental nature. We would also need to provide a way to pass
custom parameters to the arbitrary queries which
would be defined by the user. It is feasible but needs significant
design work. We will make note of this in our future work.

I hope that makes sense.

Thanks,
Rushiraj

On Mon, Jan 6, 2020 at 2:40 PM 蒋晓峰 <pr...@163.com> wrote:
>
> Hi Purushotham,
>     About arbitrary queries (multi table complex queries), why support this only in Bulk Mode? What concern about this?
> Thanks,
> Nicholas
>
>
> At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <pu...@gmail.com> wrote:
> >Hi everyone,
> >
> >We are working on introducing JDBC Delta Streamer as
> >one of the sources for HUDI. We've drafted initial version of design in
> >RFC-14.
> >Kindly review and let us know your thoughts.
> >
> >I'm initiating this thread to discuss few comments raised by Vinoth.
> >
> >   1. As discussed on the RFC page, we are going to support compound
> >   incremental columns.
> >   2. About arbitrary queries (multi table complex queries), we are
> >   planning to support this only in Bulk Mode.
> >
> >
> >[1] https://issues.apache.org/jira/browse/HUDI-251
> >[2]
> >https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
> >
> >Regards,
> >Purushotham Pushpavanth

Re:[DISCUSS] RFC-14 JDBC incremental puller

Posted by 蒋晓峰 <pr...@163.com>.
Hi Purushotham,
    About arbitrary queries (multi table complex queries), why support this only in Bulk Mode? What concern about this?
Thanks,
Nicholas


At 2020-01-06 15:41:31, "Purushotham Pushpavanthar" <pu...@gmail.com> wrote:
>Hi everyone,
>
>We are working on introducing JDBC Delta Streamer as
>one of the sources for HUDI. We've drafted initial version of design in
>RFC-14.
>Kindly review and let us know your thoughts.
>
>I'm initiating this thread to discuss few comments raised by Vinoth.
>
>   1. As discussed on the RFC page, we are going to support compound
>   incremental columns.
>   2. About arbitrary queries (multi table complex queries), we are
>   planning to support this only in Bulk Mode.
>
>
>[1] https://issues.apache.org/jira/browse/HUDI-251
>[2]
>https://cwiki.apache.org/confluence/display/HUDI/RFC+-+14+%3A+JDBC+incremental+puller
>
>Regards,
>Purushotham Pushpavanth