You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/06/07 02:28:51 UTC

[GitHub] [hudi] wangxianghu commented on pull request #2963: [HUDI-1904] Make SchemaProvider spark free and move it to hudi-client-common

wangxianghu commented on pull request #2963:
URL: https://github.com/apache/hudi/pull/2963#issuecomment-855532282


   > Let me summarize things.
   > Existing schemaProvider has constructor specific to spark engine (typed props and java spark Context).
   > And objective is to reuse the schemaProvider across all engines.
   > 
   > So, as I mentioned before, may not be easy.
   > One option I can think of is.
   > Introduce a new interface called SchemaProviderInterface.
   > 
   > ```
   > public interface SchemaProviderInterface implements Serializable {
   >   public abstract Schema getSourceSchema();
   >   public Schema getTargetSchema();
   > }
   > ```
   > 
   > Make existing SchemaProvider implement this new interface. So, existing schema provider will stay as is w/o much changes and is meant only for spark engine. and all classes inheriting from SchemaProvider abstract class does not need any changes.
   > If we want to we can introduce a similar class for each engine.
   > 
   > But trying to see how much of a benefit we get out of it.
   > Do you know how many existing schema provider implementations does not have any dependency to any of spark entities. If yes, we might benefit a lot. If not, we might have to fix most of them to make it generic so that flink and java can re-use them.
   > 
   > So, is the intent to use existing schema provider impls for flink and java as well? Or is it to fix just the FileBased duplicated code.
   
   For spark engine, almost all of the `SchemaProvider` have dependency to spark entities, maybe introducing an interface is a good choice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org