You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yixue (Andrew) Zhu (Jira)" <ji...@apache.org> on 2020/05/14 19:27:00 UTC

[jira] [Created] (HUDI-898) Need to add Schema parameter to HoodieRecordPayload::preCombine

Yixue (Andrew) Zhu created HUDI-898:
---------------------------------------

             Summary: Need to add Schema parameter to HoodieRecordPayload::preCombine
                 Key: HUDI-898
                 URL: https://issues.apache.org/jira/browse/HUDI-898
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: Common Core
            Reporter: Yixue (Andrew) Zhu


We are working on Mongo Oplog integration with Hudi, to stream Mongo updates to Hudi tables.

There are 4 Mongo OpLog operations we need to handle, CRUD (create, read, update, delete).

Currently Hudi handle create/read, delete, but not update well with existing preCombine API in HoodieRecordPayload class. In particularly, Update operation contains "patch" field, which is extended Json describing update for dot separated field paths.

We need to pass Avro schema to preCombine API for it to work:

Even though BaseAvroPayload constructor accepts GenericRecord, which has Avro schema reference, but it materialize GenericRecord to bytes, to support serialization/deserialization by ExternalSpillableMap.

 

Is there concern/objection to this? in other words, have I overlooked something?

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)