You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/10/24 22:47:36 UTC

[GitHub] [hudi] vinothchandar commented on pull request #2106: [HUDI-1284] preCombine all HoodieRecords and update all fields according to orderingVal

vinothchandar commented on pull request #2106:
URL: https://github.com/apache/hudi/pull/2106#issuecomment-716065129


   @Karl-WangSK We need some more thinking to resolve the pending issues I think.
   
   >> Only the classes that I just added in this PR will take advantage of it.
   
   I am thinking about how we can safely make this the only API the Hudi code calls. TOL, existing payload classes will have just `precombine(payload)` defined and not the  new `preCombine(payload, schema)` API method. But, the default method will help existing payloads not break and work? (I think so. would be good to confirm) 
   
   >> But one problem is that all payload will parse the schema every record whether it needs or not.It will affect performance.
   
   Now, we can think about how to perf issue. We can parse it once on the driver and then send it across. but issue is `Schema` is not serializable. But I think we can solve this, by wrapping Schema in a `SerializableAvroSchema` class (see how its done in `SerializableConfiguration`). This way we will send a Schema as a string to the executor and then reconstruct the object again on the executor. 
   
   Are you able to attempt this. (else I ll try. might take bit of time.please let me know) 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org