You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Pratyaksh Sharma <pr...@gmail.com> on 2020/03/20 09:59:50 UTC

Re: [DISCUSS] Consider defaultValue of field when writing to Hudi dataset

https://issues.apache.org/jira/browse/HUDI-727 tracks this.

On Tue, Feb 25, 2020 at 2:23 PM Pratyaksh Sharma <pr...@gmail.com>
wrote:

> Hi Vinoth,
>
> > in avro you define it as an optional field (union of type and null)..
> Yes that is correct. But imagine if someone does not want to populate
> null, rather he wants to populate default values for the field, which is a
> very common case.
>
> > seems like it's being copied over?
> When creating writer schema, we are copying Avro fields properly. However
> at the time of actually populating values as per new writer schema, we are
> not taking care of default values like I pointed out in my previous mail.
>
> > Nonetheless, we should copy over the default values, if that code is not
> doing so.
> Yeah, will raise a jira for the same.
>
> >
>
> On Tue, Feb 25, 2020 at 12:21 PM Vinoth Chandar <vi...@apache.org> wrote:
>
>> IIUC the link between backwards compatibility and default values for
>> fields
>> in schema is that, in avro you define it as an optional field (union of
>> type and null).. Not sure if it has anything to do with default values.
>>
>> Nonetheless, we should copy over the default values, if that code is not
>> doing so.
>>
>> https://github.com/apache/incubator-hudi/blob/078d4825d909b2c469398f31c97d2290687321a8/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java#L124
>> seems like it's being copied over?
>>
>> On Mon, Feb 24, 2020 at 4:21 AM Pratyaksh Sharma <pr...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > Currently we recommend users to evolve schema in backwards compatible
>> way.
>> > When one is trying to evolve schema in backwards compatible way, one of
>> the
>> > most significant things to do is to define default value for newly added
>> > columns so that records published with previous schema also can be
>> consumed
>> > properly.
>> >
>> > However just before actually writing record to Hudi dataset, we try to
>> > rewrite record with new Avro schema which has Hudi metadata columns
>> [1]. In
>> > this function, we are only trying to get the values from record without
>> > considering field's default value. As a result, schema validation
>> fails. In
>> > essence this feels like we are not even respecting backwards compatible
>> > schema changes.
>> > IMO, this piece of code should take into account default value as well
>> in
>> > case field's actual value is null.
>> >
>> > Open to hearing others' thoughts.
>> >
>> > [1]
>> >
>> >
>> https://github.com/apache/incubator-hudi/blob/078d4825d909b2c469398f31c97d2290687321a8/hudi-common/src/main/java/org/apache/hudi/common/util/HoodieAvroUtils.java#L205
>> > .
>> >
>>
>