You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Pratyaksh Sharma <pr...@gmail.com> on 2020/03/21 09:35:28 UTC

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

@Balaji @Vinoth Chandar <vi...@apache.org>,

Here is a small attempt to make this a generic one -
https://github.com/apache/incubator-hudi/pull/1433/files. Please have a
look, happy to hear from everyone on this.

This is just a sample, if we agree on the implementation, I will add more
test cases and improve it further.

On Thu, Feb 27, 2020 at 9:43 PM Vinoth Chandar <vi...@apache.org> wrote:

> +1 for adding a new composite KeyGenerator, which can combine both...
>
> Workaround : you can use the Transformer api to do a more flexible key
> generation as you wish as well. for deltastreamer
>
> On Tue, Feb 25, 2020 at 9:37 AM Balaji Varadarajan
> <v....@ymail.com.invalid> wrote:
>
> >
> > See if you can have a generic implementation where individual fields in
> > the partition-path can be configured with their own key-generator class.
> > Currently, TimestampBasedKeyGenerator is the only type specific custom
> > generator. If we are anticipating more such classes for specialized
> types,
> > you can use a generic way to support overriding key-generator for
> > individual partition-fields once and for all.
> > Balaji.V    On Monday, February 24, 2020, 03:09:02 AM PST, Pratyaksh
> > Sharma <pr...@gmail.com> wrote:
> >
> >  Hi,
> >
> > We have TimestampBasedKeyGenerator for defining custom partition paths
> and
> > we have ComplexKeyGenerator for supporting having combination of fields
> as
> > record key or partition key.
> >
> > However we do not have support for the case where one wants to have
> > combination of fields as record key along with being able to define
> custom
> > partition paths. This use case recently came up at my organisation.
> >
> > How about having CustomTimestampBasedKeyGenerator which supports the
> above
> > use case? This class can simply extend TimestampBasedKeyGenerator and
> allow
> > users to have combination of fields as record key.
> >
> > Open to hearing others' opinions.
> >
>

Re: [DISCUSS] Support for complex record keys with TimestampBasedKeyGenerator

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Pratyaksh,

Thanks for opening this. Will review and get back to you!

Thanks
Vinoth

On Sat, Mar 21, 2020 at 2:35 AM Pratyaksh Sharma <pr...@gmail.com>
wrote:

> @Balaji @Vinoth Chandar <vi...@apache.org>,
>
> Here is a small attempt to make this a generic one -
> https://github.com/apache/incubator-hudi/pull/1433/files. Please have a
> look, happy to hear from everyone on this.
>
> This is just a sample, if we agree on the implementation, I will add more
> test cases and improve it further.
>
> On Thu, Feb 27, 2020 at 9:43 PM Vinoth Chandar <vi...@apache.org> wrote:
>
>> +1 for adding a new composite KeyGenerator, which can combine both...
>>
>> Workaround : you can use the Transformer api to do a more flexible key
>> generation as you wish as well. for deltastreamer
>>
>> On Tue, Feb 25, 2020 at 9:37 AM Balaji Varadarajan
>> <v....@ymail.com.invalid> wrote:
>>
>> >
>> > See if you can have a generic implementation where individual fields in
>> > the partition-path can be configured with their own key-generator class.
>> > Currently, TimestampBasedKeyGenerator is the only type specific custom
>> > generator. If we are anticipating more such classes for specialized
>> types,
>> > you can use a generic way to support overriding key-generator for
>> > individual partition-fields once and for all.
>> > Balaji.V    On Monday, February 24, 2020, 03:09:02 AM PST, Pratyaksh
>> > Sharma <pr...@gmail.com> wrote:
>> >
>> >  Hi,
>> >
>> > We have TimestampBasedKeyGenerator for defining custom partition paths
>> and
>> > we have ComplexKeyGenerator for supporting having combination of fields
>> as
>> > record key or partition key.
>> >
>> > However we do not have support for the case where one wants to have
>> > combination of fields as record key along with being able to define
>> custom
>> > partition paths. This use case recently came up at my organisation.
>> >
>> > How about having CustomTimestampBasedKeyGenerator which supports the
>> above
>> > use case? This class can simply extend TimestampBasedKeyGenerator and
>> allow
>> > users to have combination of fields as record key.
>> >
>> > Open to hearing others' opinions.
>> >
>>
>