You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Jing Zhang (Jira)" <ji...@apache.org> on 2023/02/03 10:35:00 UTC
[jira] [Commented] (HUDI-3217) RFC-46: Optimize Record Payload handling
[ https://issues.apache.org/jira/browse/HUDI-3217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17683818#comment-17683818 ]
Jing Zhang commented on HUDI-3217:
----------------------------------
[~alexey.kudinkin] [~wzx] [~minihippo] Nice work!
Is there any plan to apply this optimization on Flink integration?
I would like to contribute this improvement.
> RFC-46: Optimize Record Payload handling
> ----------------------------------------
>
> Key: HUDI-3217
> URL: https://issues.apache.org/jira/browse/HUDI-3217
> Project: Apache Hudi
> Issue Type: Epic
> Components: storage-management, writer-core
> Reporter: Alexey Kudinkin
> Assignee: Alexey Kudinkin
> Priority: Critical
> Labels: hudi-umbrellas, pull-request-available
> Fix For: 0.13.0
>
>
> Currently Hudi is biased t/w assumption of particular payload representation (Avro), long-term we would like to steer away from this to keep the record payload be completely opaque, so that
> # We can keep record payload representation engine-specific
> # Avoid unnecessary serde loops (Engine-specific > Avro > Engine-specific > Binary)
> h2. *Proposal*
>
> *Phase 2: Revisiting Record Handling*
> {_}T-shirt{_}: 2-2.5 weeks
> {_}Goal{_}: Avoid tight coupling with particular record representation on the Read Path (currently Avro) and enable
> * Revisit RecordPayload APIs
> ** Deprecate {{getInsertValue}} and {{combineAndGetUpdateValue}} APIs replacing w/ new “opaque” APIs (not returning Avro payloads)
> ** Rebase RecordPayload hierarchy to be engine-specific:
> *** Common engine-specific base abstracting common functionality (Spark, Flink, Java)
> *** Each feature-specific semantic will have to implement for all engines
> ** Introduce new APIs
> *** To access keys (record, partition)
> *** To convert record to Avro (for BWC)
> * Revisit RecordPayload handling
> ** In WriteHandles
> *** API will be accepting opaque RecordPayload (no Avro conversion)
> *** Can do (opaque) record merging if necessary
> *** Passes RP as is to FileWriter
> ** In FileWriters
> *** Will accept RecordPayload interface
> *** Should be engine-specific (to handle internal record representation
> ** In RecordReaders
> *** API will be providing opaque RecordPayload (no Avro conversion)
>
> REF
> [https://app.clickup.com/18029943/v/dc/h67bq-1900/h67bq-6680]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)