You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by kaka chen <ka...@gmail.com> on 2019/02/27 07:16:56 UTC

On MergeOnRead mode, when a record update more than once in a parttition, it does not work.

Hi All,
On MergeOnRead mode, when a record update more than once in a parttition,
it does not work.
I found It used HoodieAvroPayload, which is the default class of
"hoodie.compaction.payload.class", which preCombine method only return this.

@Override
public HoodieAvroPayload preCombine(HoodieAvroPayload another) {
return this;
}

So it will only return first update record on the log file.
As expected, it should return most recent update record.

Thanks,
Kaka Chen

Re: On MergeOnRead mode, when a record update more than once in a parttition, it does not work.

Posted by kaka chen <ka...@gmail.com>.
Thanks.

nishith agarwal <n3...@gmail.com> 于2019年2月27日周三 下午3:44写道:

> Thanks for pointing that out Kaka, I think HoodieAvroPayload is assigned to
> be the default class hence the confusion.
>
> You could implement your own payload class to achieve this or take a look
> at
>
> https://github.com/uber/hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/OverwriteWithLatestAvroPayload.java
> .
>
> -Nishith
>
> On Tue, Feb 26, 2019 at 11:17 PM kaka chen <ka...@gmail.com> wrote:
>
> > Hi All,
> > On MergeOnRead mode, when a record update more than once in a parttition,
> > it does not work.
> > I found It used HoodieAvroPayload, which is the default class of
> > "hoodie.compaction.payload.class", which preCombine method only return
> > this.
> >
> > @Override
> > public HoodieAvroPayload preCombine(HoodieAvroPayload another) {
> > return this;
> > }
> >
> > So it will only return first update record on the log file.
> > As expected, it should return most recent update record.
> >
> > Thanks,
> > Kaka Chen
> >
>

Re: On MergeOnRead mode, when a record update more than once in a parttition, it does not work.

Posted by nishith agarwal <n3...@gmail.com>.
Thanks for pointing that out Kaka, I think HoodieAvroPayload is assigned to
be the default class hence the confusion.

You could implement your own payload class to achieve this or take a look
at
https://github.com/uber/hudi/blob/master/hoodie-spark/src/main/java/com/uber/hoodie/OverwriteWithLatestAvroPayload.java
.

-Nishith

On Tue, Feb 26, 2019 at 11:17 PM kaka chen <ka...@gmail.com> wrote:

> Hi All,
> On MergeOnRead mode, when a record update more than once in a parttition,
> it does not work.
> I found It used HoodieAvroPayload, which is the default class of
> "hoodie.compaction.payload.class", which preCombine method only return
> this.
>
> @Override
> public HoodieAvroPayload preCombine(HoodieAvroPayload another) {
> return this;
> }
>
> So it will only return first update record on the log file.
> As expected, it should return most recent update record.
>
> Thanks,
> Kaka Chen
>